Pràctica ASI-C6-ASGBD: Scripts amb PostgreSQL

Contingut

1 Objectius: Automating PostgreSQL Tasks
2 Desenvolupament Pràctica
3 Entrega
4 Recursos
- 4.1 Ús de l'opció \o
- 4.2 Ús de l'opció \SET
5 Durarda

Objectius: Automating PostgreSQL Tasks

Anem a automatitzar tasques administratives amb diferents tècniques i scripts. Concretament, utilitzaràs scripts fets amb SQL, bash i Perl, i diferents opcions de la comanda psql. Aquesta pràctica és un exemple de com els administradors de sistema i DBAs han de conèixer i no tenir por a treballar en la consola per tal d'automatitzar tasques i estalviar-se feina.

Desenvolupament Pràctica

http://www.onlamp.com/pub/a/onlamp/2004/12/09/pg_automation.html

omplir la taula PRODUCTS amb valors que vénen en format CSV

Per començar a treballar, primer de tot hem de crear en el postgres la taula products. En l'exemple s'utilitza la base de dades postgres i l'usuari postgres. Fés servir el script creacio_taula_productes.sql.

Ja tenim la taula. Les dades estan en el fitxer products.csv, en format CSV. En aquest cas només hi ha tres files, però n'hi podria haver centenars o milers. Per convertir el format CSV al format INSERT INTO que entén el Postgres, utilitzem un script fet amb llenguatge PERL: create_input_sql.pl. Aquest script l'integrem en el script bash load_new_products.sh, de manera que ara ja tenim tot el procés: com a origen tenim el fitxer products.csv, i com a destí hem de comprovar que s'han importat les dades en la taula PRODUCTS.

$ sudo ./load_new_products.sh

on s'executa la comanda:

./create_input_sql.pl products.csv

Per comprovar el bon funcionament del procés podem editar els fitxer load_new_products.log i runme.postgresql.

Fer els informes de vendes

Primer de tot, dades d'exemple per a la taula sales (vendes):

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Hancock', 'Medford', 'MA', '02145', 'john_hancock@pretendmail.com', 234, 'The Signature Widget', '2004-10-13');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Bob', 'Smith', 'Boston', 'MA', '02123', 'bob_smith@pretendmail.com', 234, 'The Bigger Widget', '2004-10-14');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Jane', 'Doe', 'Somerville', 'MA', '02144', 'jane_doe@pretendmail.com', 234, 'The Nuclear Widget', '2004-10-15');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Adams', 'Cambridge', 'MA', '02122', 'john_adams@pretendmail.com', 234, 'The Beer Widget', '2004-10-16');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Joe', 'Random', 'New York', 'NY', '02122', 'john_smith@pretendmail.com', 234, 'The Stylish Widget', '2004-10-17');

Ara ja podem fer els informes. Els farem amb un script on li passis dos paràmetres, les dates d'inici i fi a partir de les quals fer el report. Per exemple:

./gen_sales_report_param.sh 2004/10/14 2004/10/16

#!/bin/sh

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

psql --set start_date=\'$1\' \
     --set end_date=\'$2\' \
     -f generate_monthly_sales_report.postgresql sales_db

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

on el fitxer generate_monthly_sales_report.postgresql és:

-- generate_monthly_sales_report.postgresql
-- 
-- Outputs one month's-worth of sales to a CSV file that
-- is easily imported into spreadsheet software.

\o report.csv
\a
\f ,
select product_id, product_name, product_sale_date from sales where 
product_sale_date >= :start_date and product_sale_date <= :end_date;
\o

L'informe que heu d'obtenir en el fitxer report.csv és similar a:

 product_id,product_name,product_sale_date
 234,The Big Widget,2004-10-13
 234,The Signature Widget,2004-10-13
 234,The Bigger Widget,2004-10-14
 234,The Nuclear Widget,2004-10-15
 234,The Beer Widget,2004-10-16
 234,The Stylish Widget,2004-10-17
(6 rows)

Escollim el format CSV perquè és el format natural per importar la informació als fulls de càlcul.

Per acabar, cron

Finalment l'execució d'aquests scripts i la generació dels informes es poden automatitzar de manera que s'executin de forma periòdic. Per exemple, un problema a plantejar seria la generació automàtica d'un informe setmanal de vendes. Hauríem de modificar el script anterior i crear una tasca cron que ens generi l'informe. Aquest, en comptes de dir-se report.csv s'hauria de dir report_ddmmyyy.csv (ddmmyyy es substitueix pel dia, mes i any).

Entrega

Moodle

Recursos

script creacio_taula_productes.sql:

drop sequence products_row_num_seq;
 
create sequence products_row_num_seq cycle;
 
drop table products;
 
create table products (
  row_num int not null default nextval('products_row_num_seq'),
  id int primary key,
  name varchar(35) not null,
  description varchar(1024) null,
  price numeric(8, 2) not null);
 
comment on table products is 'Records of products sold at the bookstore.';
comment on column products.row_num is 'For de-duping purposes;

drop sequence sales_row_num_seq;

create sequence sales_row_num_seq cycle;

drop table sales;

create table sales (
  row_num int not null default nextval('sales_row_num_seq'),
  first_name varchar(35) not null,
  middle_name varchar(35) null,
  last_name varchar(35) not null,
  city varchar(35) not null,
  state char(2) not null,
  zip varchar(5) not null,
  email_address varchar(50) not null,
  product_id int not null,
  product_name varchar(50) not null,
  product_sale_date date not null);

comment on table sales is 'Conains records of books sold from the bookstore.';
comment on column sales.row_num is 'For de-duping purposes.';

-- PLEASE don't regard this as good database design; customers should be
-- foreign keys, as should products.

script perl per crear els inserts de forma automàtica: create_input_sql.pl:

#!/usr/bin/perl

# create_input_sql.pl --> Create PostgreSQL sql statements from
# .csv file for products table, and put them in a file called
# runme.postgresql.

# by Manni Wood

use strict;

my $csv_file_name = $ARGV[0];
unless ($csv_file_name) {
    print STDERR "Please enter the name of a CSV file.\n";
    print STDERR "Aborting.\n";
    exit 1;
}
my $output_file_name = "runme.postgresql";

open (IN, $csv_file_name) || die "Could not read \"$csv_file_name\": $!";
open (OUT, ">$output_file_name") || die "Could not write \"$output_file_name\": $!";
# The first line of this file is the field names, which we do *not*
# want to insert into the database. So read and discard the first line.
$_ = <IN>;

# Instructions to psql to stop on the first error.
print OUT "\\set ON_ERROR_STOP 1\n";
# Do this all as one transaction.
print OUT "begin;\n";

while (<IN>) {
    # Blank lines can creep into .csv files. Skip them.
    next if (m/^\s*$/);
    chomp;
    my ($id, $name, $description, $price) = parse_csv($_);

    # Make all fields SQL-friendly
    $id = nullify_field($id);
    $name = quote_or_nullify_field($name);
    $description = quote_or_nullify_field($description);
    $price = nullify_field($price);

    print OUT "insert into products (id, name, description, price) values (" .
    $id . ", " .
    $name . ", " .
    $description . ", " .
    $price . ");\n";
}
print OUT "commit;\n";
close (OUT);
close (IN);

sub parse_csv {
    # The goal here is to just go ahead and split on commas,
    # and then find chunks that start with a " and assume that we've
    # broken apart a field containing commas; re-join the chunks to
    # the chunk beginning with " until we find a chunk ending with ".
    my @chunks;
    my $chunk;
    my @fields;
    my $field;
    my $line = shift;

    @chunks = split(/,/, $line);

    my $i;
    for ($i = 0; $i <= $#chunks; ++$i) {
        $chunk = $chunks[$i];
        $field = $chunk;
        # If chunk starts wtih a double-quote but does not end with one,
        if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) ne '"') {
            # join the current chunk with the next chunk, replacing the
            # comma that got eliminated during the split on commas
            ++$i;
            $chunk = $chunks[$i];
            $field .= "," . $chunk;
            # and continue to do so until you find a chunk that ends with "
            # or you have run out of chunks.
            while (substr($chunk, -1, 1) ne '"' && $i <= $#chunks) {
                ++$i;
                $chunk = $chunks[$i];
                $field .= "," . $chunk;
            }
            # Our goal is to get rid of all field delimitors, so
            # get rid of the leading double-quote
            $field = substr($field, 1);
            # and the trailing double-quote.
            chop($field);
        }
        # If the chunk starts and ends with a double-quote,
        if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) eq '"') {
            # get rid of the leading double-quote
            $field = substr($field, 1);
            # and the trailing double-quote.
            chop($field);
        }
        # In CSV files, double quotes are escaped by doubling them up,
        # so un-double them.
        $field =~ s/""/"/g;
        # Finally, we have a field that is completely usable, so add it to the
        # array of fields we will return at the end of this subroutine.
        push(@fields, $field);
        # Clear the field for re-use in the next iteration of this loop.
        $field = "";
    }
    return @fields;
}

sub nullify_field {
    # For SQL fields that do not need to be wrapped in single quotes.
    # If a field is empty, replace it with the string "null",
    # which can safely be used in sql insert statements.
    my $field = $_[0];
    if ($field) {
        return $field;
    } else {
        return "null";
    }
}

sub quote_or_nullify_field {
    # For SQL fields that need to be wrapped in single quotes.
    # If a field is empty, replace it with the string "null",
    # which can safely be used in sql insert statements.
    # If it is not empty, replace any single quote with double single
    # quotes (that's how you escape them in SQL) and wrap the whole thing
    # in single quotes (for use in a SQL insert statement).
    my $field = $_[0];
    if ($field) {
        # Remove backslashes: they cause havoc in SQL.
        $field =~ s/\\//g;
        # Repeat single quotes: this is how they are escaped in SQL.
        $field =~ s/'/''/g;
        $field = "'" . $field . "'";
    } else {
        $field = "null";
    }
}

load_new_products.sh:

#!/bin/sh

# load_new_products.sh
#
# Wrapper script that transforms products.csv into runme.postgresql
# and feeds runme.postgresql to psql.

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

./create_input_sql.pl products.csv  # outputs runme.postgresql

psql -f runme.postgresql postgres 1> load_new_products.log 2>&1

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

Script per fer els informes: gen_sales_report_param.sh:

#!/bin/sh

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

psql --set start_date=\'$1\' \
     --set end_date=\'$2\' \
     -f report_mensual_param.sql postgres

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

Script report_mensual_param.sql:

-- generate_monthly_sales_report.postgresql
-- 
-- Outputs one month's-worth of sales to a CSV file that
-- is easily imported into spreadsheet software.

\o report.csv
\a
\f ,
select product_id, product_name, product_sale_date from sales where 
product_sale_date *gt;= '2004-10-01' and product_sale_date <= '2004-10-31';
\o

Ús de l'opció \o

per enviar el resultat d'una consulta a fitxer:

\o [FILE]      send all query results to file or |pipe
postgres# \o reposrt.txt
postgres# \o select product_id, product_name, product_sale_date from sales where 
product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31';
postgres# \o

Per obtenir un resultat que sigui aprofitable per a Excel o OpenOffice.org:

\o report.csv
\a -> toggle between unaligned and aligned output mode
\f , -> \f [STRING]    show or set field separator for unaligned query output (en aquest cas, una coma)

postgres# select product_id, product_name, product_sale_date from sales where 
product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31';
postgres# \o

Ús de l'opció \SET

\set [NAME [VALUE]] -> set internal variable, or list all if no parameters

Durarda

2 hores

Pràctica ASI-C6-ASGBD: Scripts amb PostgreSQL

Contingut

Objectius: Automating PostgreSQL Tasks

Desenvolupament Pràctica

omplir la taula PRODUCTS amb valors que vénen en format CSV

Fer els informes de vendes

Per acabar, cron

Entrega

Recursos

Ús de l'opció \o

Ús de l'opció \SET

Durarda

Menú de navegació

Eines de l'usuari

Espais de noms

Variants

Vistes

Més

Cerca

Navegació

Institut Jaume Balmes

Màquines recreatives

CNC

Informàtica musical

joanillo.org Planet

Eines