Pràctica ASI-C6-ASGBD: Scripts amb PostgreSQL
Contingut
Objectius: Automating PostgreSQL Tasks
Anem a automatitzar tasques administratives amb diferents tècniques i scripts. Concretament, utilitzaràs scripts fets amb SQL, bash i Perl, i diferents opcions de la comanda psql. Aquesta pràctica és un exemple de com els administradors de sistema i DBAs han de conèixer i no tenir por a treballar en la consola per tal d'automatitzar tasques i estalviar-se feina.
Desenvolupament Pràctica
omplir la taula PRODUCTS amb valors que vénen en format CSV
Per començar a treballar, primer de tot hem de crear en el postgres la taula products. En l'exemple s'utilitza la base de dades postgres i l'usuari postgres. Fés servir el script creacio_taula_productes.sql.
Ja tenim la taula. Les dades estan en el fitxer products.csv, en format CSV. En aquest cas només hi ha tres files, però n'hi podria haver centenars o milers. Per convertir el format CSV al format INSERT INTO que entén el Postgres, utilitzem un script fet amb llenguatge PERL: create_input_sql.pl. Aquest script l'integrem en el script bash load_new_products.sh, de manera que ara ja tenim tot el procés: com a origen tenim el fitxer products.csv, i com a destí hem de comprovar que s'han importat les dades en la taula PRODUCTS.
$ sudo ./load_new_products.sh
on s'executa la comanda:
./create_input_sql.pl products.csv
Per comprovar el bon funcionament del procés podem editar els fitxer load_new_products.log i runme.postgresql.
Fer els informes de vendes
Primer de tot, dades d'exemple per a la taula sales (vendes):
insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Hancock', 'Medford', 'MA', '02145', 'john_hancock@pretendmail.com', 234, 'The Signature Widget', '2004-10-13');
insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Bob', 'Smith', 'Boston', 'MA', '02123', 'bob_smith@pretendmail.com', 234, 'The Bigger Widget', '2004-10-14');
insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Jane', 'Doe', 'Somerville', 'MA', '02144', 'jane_doe@pretendmail.com', 234, 'The Nuclear Widget', '2004-10-15');
insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Adams', 'Cambridge', 'MA', '02122', 'john_adams@pretendmail.com', 234, 'The Beer Widget', '2004-10-16');
insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Joe', 'Random', 'New York', 'NY', '02122', 'john_smith@pretendmail.com', 234, 'The Stylish Widget', '2004-10-17');
Ara ja podem fer els informes. Els farem amb un script on li passis dos paràmetres, les dates d'inici i fi a partir de les quals fer el report. Per exemple:
./gen_sales_report_param.sh 2004/10/14 2004/10/16
#!/bin/sh
# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432
export PGUSER PGPASSWORD PGHOST PGPORT
psql --set start_date=\'$1\' \
--set end_date=\'$2\' \
-f generate_monthly_sales_report.postgresql sales_db
#reset PGUSER and PGPASSWORD
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT
#End
on el fitxer generate_monthly_sales_report.postgresql és:
-- generate_monthly_sales_report.postgresql -- -- Outputs one month's-worth of sales to a CSV file that -- is easily imported into spreadsheet software. \o report.csv \a \f , select product_id, product_name, product_sale_date from sales where product_sale_date >= :start_date and product_sale_date <= :end_date; \o
L'informe que heu d'obtenir en el fitxer report.csv és similar a:
product_id,product_name,product_sale_date 234,The Big Widget,2004-10-13 234,The Signature Widget,2004-10-13 234,The Bigger Widget,2004-10-14 234,The Nuclear Widget,2004-10-15 234,The Beer Widget,2004-10-16 234,The Stylish Widget,2004-10-17 (6 rows)
Escollim el format CSV perquè és el format natural per importar la informació als fulls de càlcul.
Per acabar, cron
Finalment l'execució d'aquests scripts i la generació dels informes es poden automatitzar de manera que s'executin de forma periòdic. Per exemple, un problema a plantejar seria la generació automàtica d'un informe setmanal de vendes. Hauríem de modificar el script anterior i crear una tasca cron que ens generi l'informe. Aquest, en comptes de dir-se report.csv s'hauria de dir report_ddmmyyy.csv (ddmmyyy es substitueix pel dia, mes i any).
Entrega
Moodle
Recursos
script creacio_taula_productes.sql:
drop sequence products_row_num_seq;
create sequence products_row_num_seq cycle;
drop table products;
create table products (
row_num int not null default nextval('products_row_num_seq'),
id int primary key,
name varchar(35) not null,
description varchar(1024) null,
price numeric(8, 2) not null);
comment on table products is 'Records of products sold at the bookstore.';
comment on column products.row_num is 'For de-duping purposes;
drop sequence sales_row_num_seq;
create sequence sales_row_num_seq cycle;
drop table sales;
create table sales (
row_num int not null default nextval('sales_row_num_seq'),
first_name varchar(35) not null,
middle_name varchar(35) null,
last_name varchar(35) not null,
city varchar(35) not null,
state char(2) not null,
zip varchar(5) not null,
email_address varchar(50) not null,
product_id int not null,
product_name varchar(50) not null,
product_sale_date date not null);
comment on table sales is 'Conains records of books sold from the bookstore.';
comment on column sales.row_num is 'For de-duping purposes.';
-- PLEASE don't regard this as good database design; customers should be
-- foreign keys, as should products.
script perl per crear els inserts de forma automàtica: create_input_sql.pl:
#!/usr/bin/perl
# create_input_sql.pl --> Create PostgreSQL sql statements from
# .csv file for products table, and put them in a file called
# runme.postgresql.
# by Manni Wood
use strict;
my $csv_file_name = $ARGV[0];
unless ($csv_file_name) {
print STDERR "Please enter the name of a CSV file.\n";
print STDERR "Aborting.\n";
exit 1;
}
my $output_file_name = "runme.postgresql";
open (IN, $csv_file_name) || die "Could not read \"$csv_file_name\": $!";
open (OUT, ">$output_file_name") || die "Could not write \"$output_file_name\": $!";
# The first line of this file is the field names, which we do *not*
# want to insert into the database. So read and discard the first line.
$_ = <IN>;
# Instructions to psql to stop on the first error.
print OUT "\\set ON_ERROR_STOP 1\n";
# Do this all as one transaction.
print OUT "begin;\n";
while (<IN>) {
# Blank lines can creep into .csv files. Skip them.
next if (m/^\s*$/);
chomp;
my ($id, $name, $description, $price) = parse_csv($_);
# Make all fields SQL-friendly
$id = nullify_field($id);
$name = quote_or_nullify_field($name);
$description = quote_or_nullify_field($description);
$price = nullify_field($price);
print OUT "insert into products (id, name, description, price) values (" .
$id . ", " .
$name . ", " .
$description . ", " .
$price . ");\n";
}
print OUT "commit;\n";
close (OUT);
close (IN);
sub parse_csv {
# The goal here is to just go ahead and split on commas,
# and then find chunks that start with a " and assume that we've
# broken apart a field containing commas; re-join the chunks to
# the chunk beginning with " until we find a chunk ending with ".
my @chunks;
my $chunk;
my @fields;
my $field;
my $line = shift;
@chunks = split(/,/, $line);
my $i;
for ($i = 0; $i <= $#chunks; ++$i) {
$chunk = $chunks[$i];
$field = $chunk;
# If chunk starts wtih a double-quote but does not end with one,
if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) ne '"') {
# join the current chunk with the next chunk, replacing the
# comma that got eliminated during the split on commas
++$i;
$chunk = $chunks[$i];
$field .= "," . $chunk;
# and continue to do so until you find a chunk that ends with "
# or you have run out of chunks.
while (substr($chunk, -1, 1) ne '"' && $i <= $#chunks) {
++$i;
$chunk = $chunks[$i];
$field .= "," . $chunk;
}
# Our goal is to get rid of all field delimitors, so
# get rid of the leading double-quote
$field = substr($field, 1);
# and the trailing double-quote.
chop($field);
}
# If the chunk starts and ends with a double-quote,
if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) eq '"') {
# get rid of the leading double-quote
$field = substr($field, 1);
# and the trailing double-quote.
chop($field);
}
# In CSV files, double quotes are escaped by doubling them up,
# so un-double them.
$field =~ s/""/"/g;
# Finally, we have a field that is completely usable, so add it to the
# array of fields we will return at the end of this subroutine.
push(@fields, $field);
# Clear the field for re-use in the next iteration of this loop.
$field = "";
}
return @fields;
}
sub nullify_field {
# For SQL fields that do not need to be wrapped in single quotes.
# If a field is empty, replace it with the string "null",
# which can safely be used in sql insert statements.
my $field = $_[0];
if ($field) {
return $field;
} else {
return "null";
}
}
sub quote_or_nullify_field {
# For SQL fields that need to be wrapped in single quotes.
# If a field is empty, replace it with the string "null",
# which can safely be used in sql insert statements.
# If it is not empty, replace any single quote with double single
# quotes (that's how you escape them in SQL) and wrap the whole thing
# in single quotes (for use in a SQL insert statement).
my $field = $_[0];
if ($field) {
# Remove backslashes: they cause havoc in SQL.
$field =~ s/\\//g;
# Repeat single quotes: this is how they are escaped in SQL.
$field =~ s/'/''/g;
$field = "'" . $field . "'";
} else {
$field = "null";
}
}
load_new_products.sh:
#!/bin/sh # load_new_products.sh # # Wrapper script that transforms products.csv into runme.postgresql # and feeds runme.postgresql to psql. # gen_sales_report.sh PGUSER=postgres PGPASSWORD=postgres PGHOST=localhost PGPORT=5432 export PGUSER PGPASSWORD PGHOST PGPORT ./create_input_sql.pl products.csv # outputs runme.postgresql psql -f runme.postgresql postgres 1> load_new_products.log 2>&1 #reset PGUSER and PGPASSWORD PGUSER="" PGPASSWORD="" PGHOST="" PGPORT="" export PGUSER PGPASSWORD PGHOST PGPORT #End
Script per fer els informes: gen_sales_report_param.sh:
#!/bin/sh
# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432
export PGUSER PGPASSWORD PGHOST PGPORT
psql --set start_date=\'$1\' \
--set end_date=\'$2\' \
-f report_mensual_param.sql postgres
#reset PGUSER and PGPASSWORD
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT
#End
Script report_mensual_param.sql:
-- generate_monthly_sales_report.postgresql -- -- Outputs one month's-worth of sales to a CSV file that -- is easily imported into spreadsheet software. \o report.csv \a \f , select product_id, product_name, product_sale_date from sales where product_sale_date *gt;= '2004-10-01' and product_sale_date <= '2004-10-31'; \o
Ús de l'opció \o
per enviar el resultat d'una consulta a fitxer:
\o [FILE] send all query results to file or |pipe postgres# \o reposrt.txt postgres# \o select product_id, product_name, product_sale_date from sales where product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31'; postgres# \o
Per obtenir un resultat que sigui aprofitable per a Excel o OpenOffice.org:
\o report.csv \a -> toggle between unaligned and aligned output mode \f , -> \f [STRING] show or set field separator for unaligned query output (en aquest cas, una coma)
postgres# select product_id, product_name, product_sale_date from sales where product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31'; postgres# \o
Ús de l'opció \SET
\set [NAME [VALUE]] -> set internal variable, or list all if no parameters
Durarda
2 hores