Twitter API

De Wikijoan
Dreceres ràpides: navegació, cerca

Contingut

Llibre. O'Reilly. 21 Recipes for mining Twitter (Python)

el que és important és fer constar que aquest llibre està orientat a Python, però aquesta només és una de les maneres d'accedir a l'API de Tweeter:

python-twitter - A python wrapper around the Twitter API

Els codis d'exemple estan a:

Configuració

D'entrada no funciona

$ python recipe__oauth_login.py 
Traceback (most recent call last):
  File "recipe__oauth_login.py", line 5, in <module>
    import twitter
ImportError: No module named twitter
$ sudo apt-get install python-setuptools
$ sudo easy_install twitter
Searching for twitter
Reading http://pypi.python.org/simple/twitter/
Reading http://mike.verdone.ca/twitter/
Best match: twitter 1.6.1
Downloading http://pypi.python.org/packages/source/t/twitter/twitter-1.6.1.tar.gz#md5=f662b29747b3f18b824d4f9f07cfe1bf
...
$ python recipe__oauth_login.py 
Hi there! We're gonna get you all set up to use .
Traceback (most recent call last):
  File "recipe__oauth_login.py", line 41, in <module>
    oauth_login(APP_NAME, CONSUMER_KEY, CONSUMER_SECRET)
  File "recipe__oauth_login.py", line 19, in oauth_login
    consumer_secret)
  File "/usr/local/lib/python2.6/dist-packages/twitter-1.6.1-py2.6.egg/twitter/oauth_dance.py", line 35, in oauth_dance
    twitter.oauth.request_token())
  File "/usr/local/lib/python2.6/dist-packages/twitter-1.6.1-py2.6.egg/twitter/api.py", line 153, in __call__
    return self._handle_response(req, uri, arg_data)
  File "/usr/local/lib/python2.6/dist-packages/twitter-1.6.1-py2.6.egg/twitter/api.py", line 168, in _handle_response
    raise TwitterHTTPError(e, uri, self.format, arg_data)
twitter.api.TwitterHTTPError: Twitter sent status 401 for URL: oauth/request_token. using parameters: (oauth_consumer_key=&oauth_nonce=8172004620813145068&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1309481951&oauth_version=1.0&oauth_signature=WA5pDjCJvlsDGkdXJSolde95QY4%3D)
details: Failed to validate oauth signature and token


Així doncs, i tal com diu el llibre en la primera pàgina, el primer que he de fer és registrar-me:

 In the case of Twitter, the first step involved is
registering your application with Twitter at http://dev.twitter.com/apps where Twitter
provides you with a consumer key and consumer secret that uniquely identify your ap-
plication.

Necessito tot això perquè de fet estaré cercant informació que pertany a altres usuaris, i OAuth és un mètode d'autentificació que permet fer-ho de forma segura, sense haver de conèixer el login i pwd.

Register an Application

Es registra l'aplicació correctament:

@Anywhere Settings:

OAuth 1.0a Settings:

Somos compatibles con firmas HMAC-SHA1. No somos compatibles con el método de firma de texto plano.

General

Note: xAuth is not enabled for applications by default. See the xAuth Documentation for more information.

Així doncs, en el primer codi d'exemple, recipe__oauth_login.py, he de ficar els valors correctes de la meva aplicació.

    APP_NAME = 'JoanilloRobot'
    CONSUMER_KEY = '8Nvhw1Gp4FNI2FXaCyI5Q'
    CONSUMER_SECRET = '1Eh6lldydvxOiLfQE1t2KVyPfZhCDjG1GzS5Ty7o'

    oauth_login(APP_NAME, CONSUMER_KEY, CONSUMER_SECRET)

atenció! nous valors:

    APP_NAME = 'JoanilloRobot'
    CONSUMER_KEY = 'HzbqDXoXU80yZpKFVIwtA'
    CONSUMER_SECRET = 'nCvbUTtlkeEJjeME6Wbt1hQQT1iDv4bOGSlxCyseE'
$ python recipe__oauth_login.py 

i ara sí que funciona. M'apareix una pàgina web en el navegador que em diu que doni autorització.

¿Autorizas a JoanilloRobot para que utilice tu cuenta? JoanilloRobot

Desarrollador

   Por www.joanillo.org

URL de la aplicación

   www.joanillo.org

Acerca de esta aplicación

   TwitterRobot is a social robot that twits , and his aim is to have many friends!

Cancel, and return to app

Esta aplicación será capaz de:

   * Leer Tweets de tu cronología.
   * Ver a quién sigues.

¿Quieres autorizar el acceso de JoanilloRobot para que pueda usar tu cuenta?

Esta aplicación no tendrá capacidad para:

   * Sigue a nuevas personas.
   * Actualizar tu perfil.
   * Publicar Tweets para ti.
   * Accede a tus mensajes directos.
   * Ver tu contraseña de Twitter.

Fixem-nos que no pot seguir a altres persones i altres coses perquè només he donat permís de lectura.

Clico acceptar, i em dóna el PIN que he de posar en la consola: 2463750

$ python recipe__oauth_login.py 
Hi there! We're gonna get you all set up to use JoanilloRobot.
/usr/local/lib/python2.6/dist-packages/twitter-1.6.1-py2.6.egg/twitter/api.py:83: DeprecationWarning: object.__init__() takes no parameters
  response_typ.__init__(self, response)

In the web browser window that opens please choose to Allow
access. Copy the PIN number that appears on the next page and paste or
type it here:

Opening: http://api.twitter.com/oauth/authorize?oauth_token=KyF7zUkBpJQU2TrzYi0kUu2PxQuHRzipz50ikbplos

Please enter the PIN: 2463750
OAuth Success. Token file stored to out/twitter.oauth

Per configurar l'aplicació JoanilloRobot l'enllaç és:

i ara poso Read, write and Direct Messages: (clico també l'opció de reiniciar la clau-no calia!!)

Com assegurar-me de què ho estic fent bé, i que ara ja tinc els permisos correctes? Vull que em torni a donar la pantalla en el navegador web i que em digui clarament quins són els permisos. Molt fàcil:

$ rm out/twitter.oauth (és on es guarda la informació)
¿Autorizas a JoanilloRobot para que utilice tu cuenta?
JoanilloRobot

Desarrollador
    Por www.joanillo.org
URL de la aplicación
    www.joanillo.org
Acerca de esta aplicación

    TwitterRobot is a social robot that twits , and his aim is to have many friends!

Esta aplicación será capaz de:

    * Leer Tweets de tu cronología.
    * Ver a quién sigues y seguir a nuevas personas.
    * Actualizar tu perfil.
    * Publicar Tweets para ti.
    * Accede a tus mensajes directos.

¿Quieres autorizar el acceso de JoanilloRobot para que pueda usar tu cuenta?

Esta aplicación no tendrá capacidad para:

    * Ver tu contraseña de Twitter.

Com es veu, ara ja podré publicar twits de forma automàtica i poder saber qui em segueix.

Exemples del llibre i comentaris

recipe__get_trending_topics.py (pag 3,4)

# -*- coding: utf-8 -*-

import json
import twitter

t = twitter.Twitter(domain='api.twitter.com', api_version='1')

print json.dumps(t.trends(), indent=1)
$ python recipe__get_trending_topics.py

i la sortida per la consola és (recordar que hem configurat la meva aplicació com a aplicació de consola, i que l'altra possibilitat era configurar-la com a browser.

{
 "trends": [
  {
   "url": "http://search.twitter.com/search?q=%23nothingsmoreirritating", 
   "name": "#nothingsmoreirritating"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%23pricesthatshockyou", 
   "name": "#pricesthatshockyou"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%235thingsyoucantdo", 
   "name": "#5thingsyoucantdo"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%22Afternoon%20Delight%22", 
   "name": "Afternoon Delight"
  }, 
  {
   "url": "http://search.twitter.com/search?q=RodrigoChorandoSeFoi", 
   "name": "RodrigoChorandoSeFoi"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%22NBA%20&%20NFL%22", 
   "name": "NBA & NFL"
  }, 
  {
   "url": "http://search.twitter.com/search?q=DVDHappyRockSunday", 
   "name": "DVDHappyRockSunday"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%22Strauss-Kahn%20Case%20Seen%22", 
   "name": "Strauss-Kahn Case Seen"
  }, 
  {
   "url": "http://search.twitter.com/search?q=%22Carmen%20Ohio%22", 
   "name": "Carmen Ohio"
  }, 
  {
   "url": "http://search.twitter.com/search?q=Dorival", 
   "name": "Dorival"
  }
 ], 
 "as_of": "Fri, 01 Jul 2011 01:49:11 +0000"
}

i aquesta llista es correspon bastant aproximadament a la que puc veure a http://twitter.com (Temas del momento, he canviat Espanya per Global).

recipe__trending_topics_time_series.py (pag 3,4)

# -*- coding: utf-8 -*-

import os
import sys
import datetime
import time
import json
import twitter

t = twitter.Twitter(domain='api.twitter.com', api_version='1')

if not os.path.isdir('out/trends_data'):
        os.makedirs('out/trends_data')

while True:

    now = str(datetime.datetime.now())

    trends = json.dumps(t.trends(), indent=1)

    f = open(os.path.join(os.getcwd(), 'out', 'trends_data', now), 'w')
    f.write(trends)
    f.close()

    print >> sys.stderr, "Wrote data file", f.name
    print >> sys.stderr, "Zzz..."

    time.sleep(60) # 60 seconds
joan@joanillo32:~$ python recipe__trending_topics_time_series.py
Wrote data file /home/joan/out/trends_data/2011-07-01 14:04:22.609674
Zzz...
Wrote data file /home/joan/out/trends_data/2011-07-01 14:05:24.004790
Zzz...

Cada minut va gravant en el fitxer la informació dels temes calents, i el fitxer es pot analitzar a posteriori.

recipe__extract_tweet_entities.py (pag 5)

Per fer servir twitter_text (import import twitter_text en el codi) primer l'he d'instal.lar.

$ sudo easy_install twitter-text-py
Searching for twitter-text-py
Reading http://pypi.python.org/simple/twitter-text-py/
Reading http://github.com/dryan/twitter-text-py
Best match: twitter-text-py 1.0.3
Downloading http://pypi.python.org/packages/2.6/t/twitter-text-py/twitter_text_py-1.0.3-py2.6.egg#md5=b1e69146f8094bc82aa2f0cc4730426e
Processing twitter_text_py-1.0.3-py2.6.egg
creating /usr/local/lib/python2.6/dist-packages/twitter_text_py-1.0.3-py2.6.egg
Extracting twitter_text_py-1.0.3-py2.6.egg to /usr/local/lib/python2.6/dist-packages
Adding twitter-text-py 1.0.3 to easy-install.pth file

Installed /usr/local/lib/python2.6/dist-packages/twitter_text_py-1.0.3-py2.6.egg
Processing dependencies for twitter-text-py
Finished processing dependencies for twitter-text-py

i aleshores ja puc executar l'exemple:

$ python recipe__extract_tweet_entities.py 
[
 {
  "text": "Get @SocialWebMining example code at http://bit.ly/biais2 #w00t", 
  "entities": {
   "user_mentions": [
    {
     "indices": [
      4, 
      20
     ], 
     "screen_name": "SocialWebMining"
    }
   ], 
   "hashtags": [
    {
     "indices": [
      58, 
      63
     ], 
     "text": "w00t"
    }
   ], 
   "urls": [
    {
     "url": "http://bit.ly/biais2", 
     "indices": [
      37, 
      57
     ]
    }
   ]
  }
 }
]

cercar twits per paraula de cerca: recipe__search.py

en aquest cas s'espera un argument, que és la cadena a cercar:

$ python recipe__search.py "arduino"

Triga una mica però funciona. Per mostrar els resultats, en comptes de

print json.dumps(search_results, indent=1)

puc ficar

print [ result['text']
for result in search_results ]

i aleshores només em mostra el text del twit.

recipe__get_search_results_for_trending_topic.py

$ python recipe__get_search_results_for_trending_topic.py 
[0] #youneedtositdown
[1] #listentoyourheart
[2] #proudtobecanadian
[3] Murray/Nadal
[4] Dilma Gatinha
[5] Brendon Urie
[6] Mirror Mirror
[7] Novak Djokovic
[8] Malinga
[9] ATP

Pick a trend: 9
Fetching tweets for ATP...
Entities for tweets about trend 'ATP' saved to /home/joan/out/search_results.json

Extracting a Retweet’s Origins (pag 10)

Per exemple, mirant el Twitter veig que hi ha un twit de @jordisalvia que l'ha retwittejat el @Rogerbuch:

@jordisalvia Jordi Salvia
La CUP apareix al Baròmetre del Centre d'Estudis d'Opinió (CEO) http://twitpic.com/5j1uwz per davant de SI i C's en simpatia #unitatpopular

Creating a graph of retweet relationships (pag 13)

Utilitzarem networkx per crear la relació dels tweets.

$ sudo easy_install networkx

L'exemple mínim de crear una network-graph:

# -*- coding: utf-8 -*-
import networkx as nx
g = nx.Graph()
g.add_edge("@user1", "@user2")
g.add_edge("@user1", "@user3")
g.add_edge("@user2", "@user3")
$ python recipe__create_rt_graph.py "arduino"

Number nodes: 1175
Num edges: 1048
Num connected components: 145
Node degrees: ['14OlGA02', '1520us', '1dois3bia', '1frasefoda', '1naaay', '1soul1dream1D', '1suzannie', '9_robson', 'ALINAJIBBAKSHER', 'Adaire_Parker', 'AdamLoveMusic', 'AdeLukia', 'AdelaidaLand', 'Adrianaa_k', 'Alef_Moreira', 'AlexisBogue', 'AlleShakira', 'Always_BieberBR', 'Alyssaa_Darling', 'Amanda_A_C', 'Andre_Jb2', 'Andreina_Y', 'AndresRive', 'Aninha_Paivah', 'Anna_Frases', 'Anna_caroliina', 'Anna_rouch', 'AprilBieber_xDD', 'ArannSaulVE', 'AranzaMHG', 'ArdiraDiandraR', 'AriannaDeBieber', 'Arikaaaaa', 'Asare_xo', 'Aulia_Iero', 'AyoTippMorgan94', 'B102U', 'BIEBEROBSESS3Dx',

ara falta la part interessant, que és visualitzar el graf

Visualizing a graph of retweet relationships

Linux and Unix users could simply emit DOT language output by using networkx.
drawing.write_dot and then transform the DOT language output into a static image
with the dot or circo utilities on the command line. For example, circo -Tpng
-Otwitter_retweet_graph twitter_retweet_graph.dot would transform a sample DOT
file to a PNG image with the same name.
$ dot
El programa «dot» no està instal·lat actualment.  Podeu instal·lar-lo si escriviu:
sudo apt-get install graphviz (que depèn de libgraphviz4)

Ara ja puc utilitzar tant dot com circo.

$ python recipe__visualize_rt_graph_graphviz.py "#arduino"
Data file written to: /home/joan/out/twitter_retweet_graph
Try this on the DOT output: $ dot -Tpng -O/home/joan/out/twitter_retweet_graph /home/joan/out/twitter_retweet_graph.dot

Em trobo amb el problema a l'hora de construir la imatge png:

$ dot -Tpng -O/home/joan/out/twitter_retweet_graph /home/joan/out/twitter_retweet_graph.dot
dot: width (198609 >= 32768) is too large.
Segmentation fault

però hi ha més gent que té aquest problema. Atenció: la utilitat dot té tendència a fer imatges molt llargues. Millor utilitzar circo.

i amb circo tinc el problema:

circo: failure to create cairo surface: out of memory
Segmentation fault

les utilitat dot i circo bàsicament van bé, el problema és que depenent de la consulta els resultats que s'obtenen són molt grans i aleshores la imatge és extremadament gran i no es pot visualitzar. En aquest cas, puc editar el fitxer .dot manualment i eliminar línies, i aleshores veuré com si que es pot visualitzar.

$ circo -Tpng -O/home/joan/out/twitter_retweet_graph /home/joan/out/twitter_retweet_graph.dot
$ circo -Tsvg -O/home/joan/out/twitter_retweet_graph /home/joan/out/twitter_retweet_graph.dot

En el cas que amb png no ho pugui visualitzar (out of memory), amb els formats svg o ps no es produeix aquest error, i aleshores puc veure lo gran que és la imatge.

El problema que de fet tinc és que el fitxer de sortida de la comanda

$ python recipe__visualize_rt_graph_graphviz.py "#arduino"

no vomita informació relacionada amb arduino... aquest és realment el problema que he de resoldre.

Protovis: Librería Javascript para Gráficos: http://weblatam.com/wp/protovis-libreria-javascript-para-graficos/

Com que és una llibreria Javascript, la idea de Protovis és que la visualització es fa en un navegador web.

Capturing Tweets in Real-time with the Streaming API

$ sudo apt-get install pip

és l'instal.lador de perl (perl installer program), però no em serveix

Necessito el paquet python-tweeper:

$ sudo joe apt-get /etc/apt/sources.list

deb http://ppa.launchpad.net/chris-lea/python-tweepy/ubuntu lucid main 
deb-src http://ppa.launchpad.net/chris-lea/python-tweepy/ubuntu lucid main 

$ sudo apt-get update
$ sudo apt-get install python-tweepy

D'entrada no funciona:

got this error for recipe__streaming_api.py

Traceback (most recent call last):
File "stream_tweepy.py", line 65, in
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)
TypeError: init() takes at least 4 non-keyword arguments (3 given)
$ python
>>> import tweepy
>>> tweepy.__version__
'1.7.1'
>>> 

Making robust tweeter requests

Harvesting tweets

$ python recipe__get_tweet_by_id.py 24877908333961216
{
 "favorited": false, 
 "contributors": null, 
 "truncated": false, 
 "text": "Want to get started hacking on some Twitter data? Try this turn-key exercise for visualizing RTs http://bit.ly/9SZ2kb (via @SocialWebMining)", 
 "in_reply_to_status_id": null, 
...

Utilitzar la base de dades couchdb:

$ sudo easy_install couchdb

Searching for couchdb
Reading http://pypi.python.org/simple/couchdb/
Reading http://code.google.com/p/couchdb-python/
Best match: CouchDB 0.8
Downloading http://pypi.python.org/packages/2.6/C/CouchDB/CouchDB-0.8-py2.6.egg#md5=b47f8fe5f0c76d7c45bf8e4805d43de4
Processing CouchDB-0.8-py2.6.egg
creating /usr/local/lib/python2.6/dist-packages/CouchDB-0.8-py2.6.egg
Extracting CouchDB-0.8-py2.6.egg to /usr/local/lib/python2.6/dist-packages
Adding CouchDB 0.8 to easy-install.pth file
Installing couchdb-dump script to /usr/local/bin
Installing couchpy script to /usr/local/bin
Installing couchdb-load script to /usr/local/bin
Installing couchdb-replicate script to /usr/local/bin

Installed /usr/local/lib/python2.6/dist-packages/CouchDB-0.8-py2.6.egg
Processing dependencies for couchdb
Finished processing dependencies for couchdb
$ couchdb -h
Usage: couchdb [OPTION]

The couchdb command runs the Apache CouchDB server.

Erlang inherits the environment of this command.

The exit status is 0 for success or 1 for failure.

The `-s' option will exit 0 for running and 1 for not running.

Options:

  -h          display a short help message and exit
  -V          display version information and exit
  -a FILE     add configuration FILE to chain
  -A DIR      add configuration DIR to chain
  -n          reset configuration file chain (including system default)
  -c          print configuration file chain and exit
  -i          use the interactive Erlang shell
  -b          spawn as a background process
  -p FILE     set the background PID FILE (overrides system default)
  -r SECONDS  respawn background process after SECONDS (defaults to no respawn)
  -o FILE     redirect background stdout to FILE (defaults to couchdb.stdout)
  -e FILE     redirect background stderr to FILE (defaults to couchdb.stderr)
  -s          display the status of the background process
  -k          kill the background process, will respawn if needed
  -d          shutdown the background process

Report bugs at <https://issues.apache.org/jira/browse/COUCHDB>.

al final faig

$ sudo apt-get install couchdb

La base de dades es pot accedir per url:

Example 1-20. Harvesting tweets via timelines:

$ python recipe__harvest_timeline.py 
Usage: $ recipe__harvest_timeline.py timeline_name [max_pages] [screen_name]

	timeline_name in [public, home, user]
	0 < max_pages <= 16 for timeline_name in [home, user]
	max_pages == 1 for timeline_name == public
Notes:
	* ~800 statuses are available from the home timeline.
	* ~3200 statuses are available from the user timeline.
	* The public timeline updates every 60 secs and returns 20 statuses.
	* See the streaming/search API for additional options to harvest tweets.
$ python recipe__harvest_timeline.py public 10
Fetched 20 tweets
Done fetching tweets

$ curl -X GET http://127.0.0.1:5984/_all_dbs
["tweets-public-timeline"]

Creating a Tag Cloud from Tweet Entities

$ python recipe__tweet_entities_tagcloud.py tweets-public-timeline

el paràmetre que s'ha de passar és la bd d'on s'ha d'extreure la informació (però de moment no funciona)

Amb WP-cumulus puc fer tag-clouds.

Sumarizing link targets

Extract the text from the web page, and then use a natural language processing (NLP) toolkit such as the Natural Language Toolkit (NLTK) to help you extract the most important sentences to create a machine-generated abstract.

És a dir, amb el NLTK puc extreure les paraules claus més importants d'un text.

$ sudo easy-install nltk

però falla la instal.lació bàsicament perque intenta baixar la versió per a MacOS.

$ sudo apt-get install python-nltk
$ sudo apt-get install python-beautifulsoup
$ python recipe__summarize_webpage.py 
Traceback (most recent call last):
  File "recipe__summarize_webpage.py", line 8, in <module>
    from BeautifulSoup import BeautifulStoneSoup
ImportError: No module named BeautifulSoup

BeautifulSoup és una eina per traduir a l'anglès els caràcters estranys.

$ python recipe__summarize_webpage.py http://www.joanillo.org/?lang=ca

no acaba de funcionar

Harvesting Friends and Followers

$ python recipe__get_friends_followers.py joanqc 100
Fetched 6 total ids for joanqc
[8039442, 132373965, 108459787, 90407224, 86775205, 14208559]

aquests 6 es corresponen als id de la gent que segueixo

Performing Setwise Operations on Friendship Data

You want to operate on collections of friends and followers to answer questions such
as “Who isn’t following me back?”, “Who are my mutual friends?”, and “What friends/
followers do certain users have in common?”.

tot això es fa amb les funcions intersection i difference sobre conjunts que té python:

>>> s1 = set([1,2,3])
>>> s2 = set([2,4,5])
>>> s1.intersection(s2)
set([2])
>>> s1.difference(s2)
set([1, 3])
>>> s2.difference(s1)
set([4, 5])

Utilitzarem redis per fer aquestes operacions

$ sudo apt-get install python-redis
$ python recipe__setwise_operations.py joanqc 100

D'entrada no funciona, el servidor redis ha d'estar engegat.

$ sudo apt-get install redis-server

i ara sí:

$ python recipe__setwise_operations.py joanqc 100
Fetched 6 total friend ids for joanqc
Fetched 0 total follower ids for joanqc
joanqc is following 6
joanqc is being followed by 0
6 of 6 are not following joanqc back
0 of 0 are not being followed back by joanqc
joanqc has 0 mutual friends
$ python recipe__setwise_operations.py Rogerbuch 1000
Fetched 594 total friend ids for Rogerbuch
Fetched 1073 total follower ids for Rogerbuch
Rogerbuch is following 594
Rogerbuch is being followed by 1073
152 of 594 are not following Rogerbuch back
631 of 1073 are not being followed back by Rogerbuch
Rogerbuch has 442 mutual friends

Resolving User Profile Information

si ens fixem en el codi, estem agafant informació de tres persones:

    info.update(get_info_by_screen_name(t, ['ptwobrussell', 'socialwebmining']))
    info.update(get_info_by_id(t, ['2384071']))
$ python recipe__get_user_info.py 
{
 "SocialWebMining": {
  "follow_request_sent": false, 
  "profile_use_background_image": true, 
  "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme5/bg.gif", 
  "verified": false, 
  "profile_image_url_https": "https://si0.twimg.com/profile_images/1154493071/Picture_7_normal.png", 
  "profile_sidebar_fill_color": "99CC33", 
  "is_translator": false, 
  "id": 132373965, 
  "profile_text_color": "3E4415", 
  "followers_count": 717, 
  "profile_sidebar_border_color": "829D5E", 
  "location": "Fine bookstores everywhere", 
  "default_profile_image": false, 
  "id_str": "132373965", 
  "status": {
   "favorited": false, 
   "contributors": null, 
   "truncated": false, 
   "text": "Great post/preso by @davidbliss featuring some @SocialWebMining: Data mining, it's not just for information scientists http://bit.ly/ilxL8E", 
   "created_at": "Mon Jul 04 21:37:06 +0000 2011", 
   "retweeted": false, 
   "in_reply_to_status_id": null, 
   "coordinates": null, 
   "in_reply_to_user_id_str": null, 
   "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Tweetie for Mac</a>", 
   "in_reply_to_status_id_str": null, 
   "place": null, 
   "in_reply_to_user_id": null, 
   "in_reply_to_screen_name": null, 
   "retweet_count": 1, 
   "geo": null, 
   "id": 87998392218820608, 
   "id_str": "87998392218820608"
  }, 
  "utc_offset": null, 
  "statuses_count": 178, 
  "description": "The official Twitter account for the O'Reilly title Mining the Social Web. Get the example code at http://bit.ly/biais2", 
  "friends_count": 0, 
  "profile_link_color": "D02B55", 
  "profile_image_url": "http://a1.twimg.com/profile_images/1154493071/Picture_7_normal.png", 
  "notifications": false, 
  "show_all_inline_media": false, 
  "geo_enabled": false, 
  "profile_background_color": "352726", 
  "profile_background_image_url": "http://a1.twimg.com/images/themes/theme5/bg.gif", 
  "screen_name": "SocialWebMining", 
  "lang": "en", 
  "profile_background_tile": false, 
  "favourites_count": 0, 
  "name": "MiningTheSocialWeb", 
  "url": "http://amzn.to/d1Ci8A", 
  "created_at": "Tue Apr 13 02:10:40 +0000 2010", 
  "contributors_enabled": false, 
  "time_zone": null, 
  "protected": false, 
  "default_profile": false, 
  "following": true, 
  "listed_count": 50
 }, 
 "ptwobrussell": {
  "follow_request_sent": false, 
  "profile_use_background_image": true, 
  "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/112353046/twitter-1.4.2.gif", 
  "verified": false, 
  "profile_image_url_https": "https://si0.twimg.com/profile_images/1114303240/original_normal.jpg", 
  "profile_sidebar_fill_color": "f7f7f7", 
  "is_translator": false, 
  "id": 13085242, 
  "profile_text_color": "333333", 
  "followers_count": 332, 
  "profile_sidebar_border_color": "888888", 
  "location": "Franklin, TN", 
  "default_profile_image": false, 
  "id_str": "13085242", 
  "status": {
   "favorited": false, 
   "contributors": null, 
   "truncated": false, 
   "text": "Great post/preso by @davidbliss featuring some @SocialWebMining: Data mining, it's not just for information scientists http://bit.ly/ilxL8E", 
   "created_at": "Mon Jul 04 21:37:43 +0000 2011", 
   "retweeted": false, 
   "in_reply_to_status_id": null, 
   "coordinates": null, 
   "in_reply_to_user_id_str": null, 
   "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Tweetie for Mac</a>", 
   "in_reply_to_status_id_str": null, 
   "place": null, 
   "in_reply_to_user_id": null, 
   "in_reply_to_screen_name": null, 
   "retweet_count": 0, 
   "geo": null, 
   "id": 87998546338512896, 
   "id_str": "87998546338512896"
  }, 
  "utc_offset": -21600, 
  "statuses_count": 454, 
  "description": "Specializing in data mining/visualization, agile web solutions, software development and other forms of applied computer science. See also http://amzn.to/d1Ci8A", 
  "friends_count": 66, 
  "profile_link_color": "ee8336", 
  "profile_image_url": "http://a0.twimg.com/profile_images/1114303240/original_normal.jpg", 
  "notifications": false, 
  "show_all_inline_media": false, 
  "geo_enabled": false, 
  "profile_background_color": "888888", 
  "profile_background_image_url": "http://a3.twimg.com/profile_background_images/112353046/twitter-1.4.2.gif", 
  "screen_name": "ptwobrussell", 
  "lang": "en", 
  "profile_background_tile": false, 
  "favourites_count": 191, 
  "name": "Matthew Russell", 
  "url": "http://zaffra.com", 
  "created_at": "Tue Feb 05 08:16:12 +0000 2008", 
  "contributors_enabled": false, 
  "time_zone": "Central Time (US & Canada)", 
  "protected": false, 
  "default_profile": false, 
  "following": false, 
  "listed_count": 47
 }, 
 "2384071": {
  "follow_request_sent": false, 
  "profile_use_background_image": true, 
  "id": 2384071, 
  "verified": true, 
  "profile_image_url_https": "https://si0.twimg.com/profile_images/941827802/IMG_3811_v4_normal.jpg", 
  "profile_sidebar_fill_color": "e0ff92", 
  "is_translator": false, 
  "profile_text_color": "000000", 
  "followers_count": 1478368, 
  "profile_sidebar_border_color": "87bc44", 
  "location": "Sebastopol, CA", 
  "default_profile_image": false, 
  "id_str": "2384071", 
  "status": {
   "favorited": false, 
   "contributors": null, 
   "truncated": false, 
   "text": "@indy_slug Try @instigating, @mikeloukides, @jseelybrown, @cshirky, @brainpicker, @slashdot for good tech and futurism", 
   "created_at": "Tue Jul 05 03:00:37 +0000 2011", 
   "retweeted": false, 
   "in_reply_to_status_id": null, 
   "coordinates": null, 
   "in_reply_to_user_id_str": "18135686", 
   "source": "<a href=\"http://seesmic.com/\" rel=\"nofollow\">Seesmic</a>", 
   "in_reply_to_status_id_str": null, 
   "place": null, 
   "in_reply_to_user_id": 18135686, 
   "in_reply_to_screen_name": "indy_slug", 
   "retweet_count": 4, 
   "geo": null, 
   "id": 88079808596160514, 
   "id_str": "88079808596160514"
  }, 
  "utc_offset": -28800, 
  "statuses_count": 15086, 
  "description": "Founder and CEO, O'Reilly Media. Watching the alpha geeks, sharing their stories, helping the future unfold.", 
  "friends_count": 745, 
  "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/3587880/notes.gif", 
  "profile_link_color": "0000ff", 
  "profile_image_url": "http://a1.twimg.com/profile_images/941827802/IMG_3811_v4_normal.jpg", 
  "notifications": false, 
  "show_all_inline_media": false, 
  "geo_enabled": true, 
  "profile_background_color": "9ae4e8", 
  "profile_background_image_url": "http://a1.twimg.com/profile_background_images/3587880/notes.gif", 
  "screen_name": "timoreilly", 
  "lang": "en", 
  "profile_background_tile": false, 
  "favourites_count": 27, 
  "name": "Tim O'Reilly", 
  "url": "http://radar.oreilly.com", 
  "created_at": "Tue Mar 27 01:14:05 +0000 2007", 
  "contributors_enabled": false, 
  "time_zone": "Pacific Time (US & Canada)", 
  "protected": false, 
  "default_profile": false, 
  "following": false, 
  "listed_count": 19146
 }
}

joanqc:

 "joanqc": {
  "follow_request_sent": false, 
  "profile_use_background_image": true, 
  "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png", 
  "verified": false, 
  "profile_image_url_https": "https://si0.twimg.com/profile_images/1421789253/integrated_circuit_normal.jpeg", 
  "profile_sidebar_fill_color": "DDEEF6", 
  "is_translator": false, 
  "id": 204384934, 
  "profile_text_color": "333333", 
  "followers_count": 0, 
  "profile_sidebar_border_color": "C0DEED", 
  "location": "Barcelona", 
  "default_profile_image": false, 
  "id_str": "204384934", 
  "status": {
   "favorited": false, 
   "contributors": null, 
   "truncated": false, 
   "text": "B\u00e9, m'estic introduint amb aix\u00f2 del Tweeter, acabo de modificar el perfil i ficar una foto", 
   "created_at": "Fri Jul 01 11:52:31 +0000 2011", 
   "retweeted": false, 
   "in_reply_to_status_id": null, 
   "coordinates": null, 
   "in_reply_to_user_id_str": null, 
   "source": "web", 
   "in_reply_to_status_id_str": null, 
   "place": null, 
   "in_reply_to_user_id": null, 
   "in_reply_to_screen_name": null, 
   "retweet_count": 0, 
   "geo": null, 
   "id": 86764116244561921, 
   "id_str": "86764116244561921"
  }, 
  "utc_offset": null, 
  "statuses_count": 1, 
  "description": "Media Art, electronics, music, sound design, tinkering, programming, hardware,...", 
  "friends_count": 6, 
  "profile_link_color": "0084B4", 
  "profile_image_url": "http://a1.twimg.com/profile_images/1421789253/integrated_circuit_normal.jpeg", 
  "notifications": false, 
  "show_all_inline_media": false, 
  "geo_enabled": false, 
  "profile_background_color": "C0DEED", 
  "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png", 
  "screen_name": "joanqc", 
  "lang": "es", 
  "profile_background_tile": false, 
  "favourites_count": 0, 
  "name": "Joan Quintana", 
  "url": "http://www.joanillo.org", 
  "created_at": "Mon Oct 18 15:42:59 +0000 2010", 
  "contributors_enabled": false, 
  "time_zone": null, 
  "protected": false, 
  "default_profile": true, 
  "following": false, 
  "listed_count": 0
 }

Crawling Followers to Approximate Potential Influence

popularity of their followers.

and then count the number of nodes in the graph.

En aquest cas fem depth=2 que significa que, per saber la importància d'una persona, mirem els followers dels followers:

def crawl_followers(t, r, follower_ids, limit=1000000, depth=2):
$ python recipe__crawl.py Rogerbuch
Fetched 1073 total ids for 108459787
Encountered 401 Error (Not Authorized)
Fetched 0 total ids for 207939856
Fetched 8 total ids for 298037582
Fetched 18 total ids for 313427170
Fetched 57 total ids for 106774808
Fetched 151 total ids for 221788318
...

Analyzing Friendship Relationships such as Friends of Friends

amongst users, such as friends of friends.

a graph toolkit like NetworkX that offers native graph operations.

$ python recipe__create_friendship_graph.py Rogerbuch
Processing user with id 108459787
Pickle file stored in out/Rogerbuch-friendships.gpickle

i ara faltaria acabar-ho per visualitzar les relacions de forma gràfica...

Analyzing Friendship Cliques

For example, a triangle is an example of a 3-clique since it contains only three nodes and all nodes are connected.

A partir del fitxer que hem obtingut en l'exercici anterior:

$ python recipe__clique_analysis.py out/Rogerbuch-friendships.gpickle

però dóna un error...

Analyzing the Authors of Tweets that Appear in Search Results

appear in search results.

from each search result object to look up profile information by screen name using either the /users/show or /users/lookup resources.

$ python recipe__analyze_users_in_search_results.py arduino

funciona però no mostra res per pantalla

Visualizing Geodata with a Dorling Cartogram

user profile information, included in a batch of tweets such as a search query), in order to determine if there is a correlation between location and some other criterion.

visualize it with a Dorling Cartogram.

A Dorling Cartogram is essentially a bubble chart where each bubble corresponds to a
geographic area such as a state, and each bubble is situated as close as possible to its
actual location on a map without overlapping with any other bubbles (see Fig-
ure 1-3). Since the size and/or color of each bubble can be used to represent meaningful
things, a Dorling Cartogram can give you a very intuitive view of data as it relates to
geographic boundaries or regions of a larger land mass. The Protovis toolkit comes with
some machinery for creating Dorling Cartograms for locations in the United States,

Descarrego i descomprimeixo el fitxer mbostock-protovis-v3.3.1-0-g1a61bac

Em descarrego la versió 3.2 que és la que fa servir el llibre (els desenvolupadors de Protovis estan desenvolupant una eina millor...)

$ python recipe__dorling_cartogram.py arduino
Traceback (most recent call last):
  File "recipe__dorling_cartogram.py", line 124, in <module>
    'out/dorling_cartogram')
  File "/usr/lib/python2.6/shutil.py", line 140, in copytree
    names = os.listdir(src)
OSError: [Errno 2] No such file or directory: 'etc/protovis/dorling_cartogram'

no acaba de funcionar perquè no veig la funcionalitat dels Dorling cartograms. De totes maneres, Protovis i el seu successor valen molt la pena de mirar.

Geocoding Locations from Profiles (or Elsewhere)

and /status resources currently support.

choice, such as Google Maps.

$ sudo easy_install easy_install geopy

el primer argument és la clau de l'API de Google Maps.

API de Twitter amb C++: twitcurl

La idea d'utiltizar curl és la mateixa que veiem en la base de dades coucchdb: curl és la manera d'executar url en línia de comandes.

Em descarrego twitterClient.zip, tot un exemple sencer d'un client de Twitter (però malauradament és un exe per a Windows). També hi ha el codi font, però malauradament és VC++.

Però sí que hi ha una versió per a Linux:

A separate branch (libtwitcurl) of twitCurl library exists for UNIX/Linux distributions. Follow these steps to build library:
  * Make sure you have g++ and dependent packages. If you don't have g++, then install it using package manager. For example, in Ubuntu this is done as follows:
{{{sudo apt-get install g++}}}
  * Install libcurl development package using package manager. For example, in Ubuntu this is done as follows:
{{{sudo apt-get install libcurl4-dev}}}
  * Install SVN client. Again, in Ubuntu this is done as follows:
{{{sudo apt-get install svn}}}
  * Check-out libtwitcurl using this command:
{{{svn co http://code.google.com/p/twitcurl/source/browse/svn/branches/libtwitcurl}}}
  * Build the library using the command {{{make}}}.

El svn no acaba de funcionar i em descarrego els fitxers directament de:

nota: el que funciona és:

$ svn co http://twitcurl.googlecode.com/svn/branches/libtwitcurl
$ make

ha compilat bé i es crea libtwitcurl.so.1.0 (hem fet build the library). Ara he de saber utilitzar aquesta llibreria...

How to use twitcurl library?

twitcurl is an open-source pure C++ library for twitter REST APIs. Currently, it has support for most of the twitter APIs and it will be updated to support all the APIs. twitcurl uses cURL library for handling HTTP requests and responses. Building applications using twitcurl is quite easy:

Ara el que s'ha de fer és:

Copy twitcurl.h and oauthlib.h to either your application's directory or to a common directory like /usr/local/include/ or /usr/include/.

$ sudo cp twitcurl.h /usr/include
$ sudo cp oauthlib.h /usr/include

Copy libtwitcurl.so.1.0 to a suitable directory like /usr/local/lib/ or /usr/lib. Also, create a symlink with names libtwitcurl.so.1 and libtwitcurl.so in the same directory where libtwitcurl.so.1.0 is copied. Example:

$ sudo cp libtwitcurl.so.1.0 /usr/local/lib
$ sudo ln -sf /usr/local/lib/libtwitcurl.so.1.0 /usr/local/lib/libtwitcurl.so.1
$ sudo ln -sf /usr/local/lib/libtwitcurl.so.1.0 /usr/local/lib/libtwitcurl.so

Link your application with twitcurl library (-ltwitcurl). For example:

$ g++ -ltwitcurl yourapp.cpp -o yourapp

To run your application, make sure that LD_LIBRARY_PATH environment variable contains the directory path where libtwitcurl.so.1 is present. If directory path is not present in LD_LIBRARY_PATH, then add it using command export command. For example:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

El codi mínim que funciona utilitzant la llibreria twitcurl és: (fitxer yourapp.cpp)

#include <cstdio>
#include <iostream>
#include <fstream>
#include "twitcurl.h"

int main( int argc, char* argv[] )
{
    twitCurl twitterObj;

}
$ g++ -ltwitcurl yourapp.cpp -o yourapp

compila bé, tinc ben enllaçada la llibreria libcurl (que depèn de la llibreria curl), i a més creo una instància de la classe twitCurl.

Ara bé...

./yourapp
./yourapp: error while loading shared libraries: libtwitcurl.so.1: cannot open shared object file: No such file or directory
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
$ ./yourapp 

Ara sí!

A partir d'aquí puc utilitzar tots els mètodes i funcions exposats en la API pública (http://code.google.com/p/twitcurl/), i com a exemple de codi puc utilitzar el twutterClient_SRC (encara que sigui per a VC++)

Vaig adaptant el codi de twutterClient_SRC a la meva aplicació

$ ./yourapp -u joanqc -p jq****

Using:
Key: 204384934-A53w1K5w2htICxMEJqqxGG0l5tdzLqExczwTo4LU
Secret: WyuPo5xmrm4OJtQGyK7AOjTx8SI2J4NeyP6v5NGyS7s

Arribo fins la part de que crea un nou tweet des de la consola:

    /* Post a new status message */

    char statusMsg[1024];

    memset( statusMsg, 0, 1024 );

    printf( "\nEnter a new status message: " );

    gets( statusMsg );

    tmpStr = statusMsg;

    replyMsg = "";

    if( twitterObj.statusUpdate( tmpStr ) )

    {

        twitterObj.getLastWebResponse( replyMsg );

        printf( "\ntwitterClient:: twitCurl::statusUpdate web response:\n%s\n", replyMsg.c_str() );

    }

    else

    {

        twitterObj.getLastCurlError( replyMsg );

        printf( "\ntwitterClient:: twitCurl::statusUpdate error:\n%s\n", replyMsg.c_str() );

    }

i efectivament vaig a http://twitter.com i veig com s'ha creat:

Twitter missatge.png

funciona perfectament tot el codi, excepte que no puc veure els Get friend ids.

Per saber les funcions i mètodes que puc utilitzar amb twitcurl he de mirar el fitxer twitcurl.h, i recordar que la documentació oficial de l'API de Twitter està a:

i que el que implementa twitcurl està a

que no és tot. Per exemple, he mirat els recursos de trends disponibles amb Local Trends Resources > trends available... però trends/1 no est'a disponible, amb la qual cosa no puc treure els trends de Rio de Janeiro, posem per cas.

JSON parser per a C++

ara que ja funciona l'API de Twitter per a C++, i que la sortida d'aquesta API són documents JSON, ara necessito un parser de JSON que em permeti navegar i extreure informació d'aquests documents.

Sembla ser que la millor solució és JsonCpp:

m'he liat amb la instal.lació. opto per una altra possibilitat:

$ make

g++ -c -Wall src/JSON.cpp -o obj/JSON.o
g++ -c -Wall src/JSONValue.cpp -o obj/JSONValue.o
g++ -c -Wall src/demo/nix-main.cpp -o obj/demo/nix-main.o
g++ -c -Wall src/demo/example.cpp -o obj/demo/example.o
g++ -c -Wall src/demo/testcases.cpp -o obj/demo/testcases.o
g++ -lm obj/JSON.o obj/JSONValue.o obj/demo/nix-main.o obj/demo/example.o obj/demo/testcases.o -o JSONDemo

compila (i veig les instruccions per compilar el projecte) i els exemples funcionen. Ara només falta integrar-ho en el meu projecte.

Simplifiquem el procés:

Fiquem tots els fitxers .cpp i .h, així com el JSON.h i functions.h dins el mateix directori (i en els include mirem que la ruta apunti al mateix directori)

g++ -c -Wall JSON.cpp -o JSON.o
g++ -c -Wall JSONValue.cpp -o JSONValue.o
g++ -c -Wall nix-main.cpp -o nix-main.o
g++ -c -Wall example.cpp -o example.o
g++ -c -Wall testcases.cpp -o testcases.o
g++ -lm JSON.o JSONValue.o nix-main.o example.o testcases.o -o JSONDemo

per fer el clean

rm *.o
rm JSONDemo

i ara ho fem amb una sola línia

g++ -c -Wall JSON.cpp JSONValue.cpp nix-main.cpp example.cpp testcases.cpp
g++ -lm JSON.o JSONValue.o nix-main.o example.o testcases.o -o JSONDemo

i ara faig el meu primer exemple: prova1.cpp

$ g++ -c -Wall JSON.cpp JSONValue.cpp prova1.cpp 
$ g++ -lm JSON.o JSONValue.o prova1.o -o prova1
$ ./prova1

prova1.cpp és ficar en un sol fitxer els exemples1 i 2 (llegir una cadena JSON i escriure una cadena JSON):

#include <string>
#include <iostream>
#include <iterator>
#include <sstream>
#include <time.h>

#include "JSON.h"
#include "functions.h"

using namespace std;

// Just some sample JSON text, feel free to change but could break demo
const wchar_t* EXAMPLE = L"\
{ \
	\"string_name\" : \"string\tvalue and a \\\"quote\\\" and a unicode char \\u00BE and a c:\\\\path\\\\ or a \\/unix\\/path\\/ :D\", \
	\"bool_name\" : true, \
	\"bool_second\" : FaLsE, \
	\"null_name\" : nULl, \
	\"negative\" : -34.276, \
	\"sub_object\" : { \
						\"foo\" : \"abc\", \
						 \"bar\" : 1.35e2, \
						 \"blah\" : { \"a\" : \"A\", \"b\" : \"B\", \"c\" : \"C\" } \
					}, \
	\"array_letters\" : [ \"a\", \"b\", \"c\", [ 1, 2, 3  ]  ] \
}    ";

// Print out function
void print_out(const wchar_t* output)
{
	wcout << output;
	wcout.flush();
}

// Linux entry point
int main(int argc, char **argv)
{
	// Required for utf8 chars
	setlocale(LC_CTYPE, "");
	
	example1();
	example2();

	return 0;
}

// Example 1
void example1()
{
	// Parse example data
	JSONValue *value = JSON::Parse(EXAMPLE);
		
	// Did it go wrong?
	if (value == NULL)
	{
		print_out(L"Example code failed to parse, did you change it?\r\n");
	}
	else
	{
		// Retrieve the main object
		JSONObject root;
		if (value->IsObject() == false)
		{
			print_out(L"The root element is not an object, did you change the example?\r\n");
		}
		else
		{
			root = value->AsObject();
			
			// Retrieving a string
			if (root.find(L"string_name") != root.end() && root[L"string_name"]->IsString())
			{
				print_out(L"string_name:\r\n");
				print_out(L"------------\r\n");
				print_out(root[L"string_name"]->AsString().c_str());
				print_out(L"\r\n\r\n");
			}
		
			// Retrieving a boolean
			if (root.find(L"bool_second") != root.end() && root[L"bool_second"]->IsBool())
			{
				print_out(L"bool_second:\r\n");
				print_out(L"------------\r\n");
				print_out(root[L"bool_second"]->AsBool() ? L"it's true!" : L"it's false!");
				print_out(L"\r\n\r\n");
			}
			
			// Retrieving an array
			if (root.find(L"array_letters") != root.end() && root[L"array_letters"]->IsArray())
			{
				JSONArray array = root[L"array_letters"]->AsArray();
				print_out(L"array_letters:\r\n");
				print_out(L"--------------\r\n");
				for (unsigned int i = 0; i < array.size(); i++)
				{
					wstringstream output;
					output << L"[" << i << L"] => " << array[i]->Stringify() << L"\r\n";
					print_out(output.str().c_str());
				}
				print_out(L"\r\n");
			}
			
			// Retrieving nested object
			if (root.find(L"sub_object") != root.end() && root[L"sub_object"]->IsObject())
			{
				print_out(L"sub_object:\r\n");
				print_out(L"-----------\r\n");
				print_out(root[L"sub_object"]->Stringify().c_str());
				print_out(L"\r\n\r\n");
			}
		}

		delete value;
	}
}

// Example 2
void example2()
{
	JSONObject root;
		
	// Adding a string
	root[L"test_string"] = new JSONValue(L"hello world");
		
	// Create a random integer array
	JSONArray array;
	srand((unsigned)time(0));
	for (int i = 0; i < 10; i++)
		array.push_back(new JSONValue((double)(rand() % 100)));
	root[L"sample_array"] = new JSONValue(array);
		
	// Create a value
	JSONValue *value = new JSONValue(root);
		
	// Print it
	print_out(value->Stringify().c_str());

	// Clean up
	delete value;
}

Finalment, prova3.cpp llegeix el fitxer json4.json, el carrega en memòria i parseja aquest fitxer. Aquest fitxer és un exemple de les diferents causístiques que ens podem trobar en una cadena json: string (amb caràcters especials), número, array, objecte anidat (nested object). En aquest últim cas, primer es mostra el contingut de tot el nested object (amb Stringify()),i després es parseja el nested object. I també un altre cas és un array d'objectes, és a dir, un array d'objectes JSON. En aquest cas com es veu en l'exemple prova3.cpp mirem el contingut de l'array, i parsegem els objectes de l'array per mirar què hi ha dins:

array_objects:
--------------
[0] => {"obj1":"aquest es el obj1"}
[1] => {"obj2":"aquest es el obj2"}
[2] => {"obj3":"aquest es el obj3"}

obj1:
------------
aquest es el obj1

Per mi aquest és un cas important perquè és el cas real que ens tobem en el codi yourapp2, on volem trobar amb twitterObj.trendsAvailableGet() els llocs (països, ciutats) on els trends locals estan disponibles. El que em retorna la cadena és una array d'objectes (només em retorna la part inclosa dins de les marques [ i ], inclosos).

twitcurl + parsejador JSON: yourapp3

Ara es tracta de posar en comú les dues tècniques. D'una banda, analitzar algun aspecte de Twitter que em retorni una cadena JSON. Després, analitzar aquesta cadena JSON per extreure'n la informació rellevant.

Miro de trobar informació de l'usuari (userGet) i el problema és que la sortida és XML i no JSON. Per tant, també necessitaré un parsejador de XML (veure yourapp5).

    /* userGet*/
	tmpStr = "Rogerbuch";
	//tmpStr = "108459787";
    if( twitterObj.userGet(tmpStr,false) ) //es posa false quan es mostra el nom i no el id
    //if( twitterObj.userGet(tmpStr,true) )
<?xml version="1.0" encoding="UTF-8"?>
<user>
  <id>108459787</id>
  <name>Roger Buch i Ros</name>
  <screen_name>Rogerbuch</screen_name>
  <location>Barcelona</location>
...

Get trends Available sí que retorna una cadena JSON

$ ./yourapp2 -u joanqc -p jq1732

Using:
Key: 204384934-A53w1K5w2htICxMEJqqxGG0l5tdzLqExczwTo4LU
Secret: WyuPo5xmrm4OJtQGyK7AOjTx8SI2J4NeyP6v5NGyS7s


twitterClient:: twitCurl::trendsAvailableGet web response:
[{"woeid":23424969,"url":"http:\/\/where.yahooapis.com\/v1\/place\/23424969","country":"Turkey","parentid":1,"countryCode":"TR","placeType":{"code":12,"name":"Country"},"name":"Turkey"},{"name":"Birmingham","parentid":23424977,"countryCode":"US","placeType":{"name":"Town","code":7},"url":"http:\/\/where.yahooapis.com\/v1\/place\/2364559","country":"United States","woeid":2364559},{"name":"Caracas","parentid":23424982,"countryCode":"VE","placeType":{"name":"Town","code":7},"url":"http:\/\/where.yahooapis.com\/v1\/place\/395269","country":"Venezuela","woeid":395269},{"name":"Bandung","countryCode":"ID","url":"http:\/\/where.yahooapis.com\/v1\/place\/1047180","parentid":23424846,"country":"Indonesia","woeid":1047180,"placeType":{"name":"Town","code":7}},

El que vull és obtenir una llista de parells woeid-name-parentid, com per exemple (veient les dades de mostra):

23424969, Turkey, 
2364559. Birmingham, 23424977
395269, Caracas, 23424982
1047180, Bandung, 23424846
...

i també es pot saber el tipus de trend (town, country,...), a quin país pertany,... el curiós és que la informació es presenta desordenada, però això no ens ha d'importar.

El problema és que el que obtinc com a cadena és un array d'objectes JSON (cadascun d'ells és una cadena JSON vàlida).

Fixem-nos que la nostra sortida comença per [ i acaba per ]. Per tant, és un array d'objectes. Com es veu a:

{
    "trends": [
        {
            "name": "#yepthatsme",
            "url": "http://search.twitter.com/search?q=%23yepthatsme"
        },
        {
            "name": "Miley Citrus",
            "url": "http://search.twitter.com/search?q=Miley+Citrus"
        },
        /* lots more */
        {
            "name": "Keith Olbermann",
            "url": "http://search.twitter.com/search?q=Keith+Olbermann"
        }
    ],
    "as_of": "Sat, 22 Jan 2011 13:37:25 +0000"
}

jo el que obtinc és un objecte amb la propietat trends (o la que sigui, doncs no apareix), i aquesta propietat és un array d'objectes:

{
    "trends": [
...
])

Jo el que obtinc a la meva sortida és només:

[{...},{...},...]

i per tant si vull parsejar-ho amb JSON he d'acabar de formatar bé la meva cadena, afegint per davant i per darrere la part que falta.

Així doncs, faig el fitxer yourapp3.cpp partint de yourapp2.cpp i de prova3.cpp. Es tracta d'ajuntar les llibreries contingudes en els dos projectes, i la manera de compilar i executar serà:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib (he d'aconseguir que no sigui necessari (TBD))

g++ -c -Wall JSON.cpp JSONValue.cpp yourapp3.cpp 
g++ -lm -ltwitcurl JSON.o JSONValue.o yourapp3.o -o yourapp3
./yourapp3 -u joanqc -p jq****

problema. En teoria tot hauria d'anar bé. A la pràctica trobo inestabilitat a l'hora de parsejar un fitxer json. Per exemple, un fitxer tan senzill com:

{"trends": [
{"countryCode":"TR","woeid":23424969,"name":"Turkey","parentid":1}
]}

a vegades el parseja bé, i a vegades em diu Example code failed to parse, did you change it? (TBD)

Mentre no solucioni això, es pot analitzar una cadena JSON sense un parsejador. Senzillament s'ha d'anar recorrent la cadena i saber què s'ha de trobar, i entre quins delimitadors està: yourapp4

yourapp4

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
$ g++ -ltwitcurl yourapp4.cpp -o yourapp4
$ ./yourapp4 -u joanqc -p jq1732

L'exemple yourapp4 és un exemple de com es pot parsejar a pèl una cadena json, amb funcions de cadena. Només cal saber què estic buscant i quinta estructura té el fitxer.

En aquest exemple recupero la informació json que em ve d'Internet, i ho volco en un fitxer, però prèviament el trossejo amb retorns de carro allà on m'interessa. Això em permet tornar-lo a obrir i poder analitzar el fitxer línia a línia. La sortida de yourapp4 és del tipus:

------------
namesec: Town
woeid: 2442047
parentid: 23424977
nameppal: Los Angeles
------------
parentid: 1
nameppal: United Arab Emirates
namesec: Country
woeid: 23424738
------------
namesec: Town
woeid: 2418046
parentid: 23424977
nameppal: Harrisburg
------------

Com es pot veure el parentid d'un Town fa referència al seu Country, i per tant és una informació que fàcilment es pot volcar a una bd. Parsejar d'aquesta manera funciona bé, ara bé he de tenir en compte les diferents causístiques i possibles excepcions. Per ex, en aquest cas he hagut d'incloure les següents línies doncs hi ha un cas especial en què no hi ha parentid.

//cas especial
if (str_nameppal=="Worldwide") {
str_nameppal="";
str_parentid="";
cout << "parentid: " << str_parentid << endl;
num_atributs++;
}

Una cosa interessant és adonar-se que les dades que recullo es presenten sense cap ordre concret, i per tant no es pot fer cap suposició de quina variable podré agafar.

Parsejador XML per a C++

TinyXML is a simple, small, minimal, C++ XML parser that can be easily integrating into other programs. It reads XML and creates C++ objects representing the XML document. The objects can be manipulated, changed, and saved again as XML.

$ cd tinyxml

$ make
g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   tinyxml.cpp -o tinyxml.o
g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   tinyxmlparser.cpp -o tinyxmlparser.o
g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   xmltest.cpp -o xmltest.o
g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   tinyxmlerror.cpp -o tinyxmlerror.o
g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   tinystr.cpp -o tinystr.o
g++ -o xmltest  tinyxml.o tinyxmlparser.o xmltest.o tinyxmlerror.o tinystr.o

compila i es genera l'executable xmltest què és un exemple complet de què es pot fer

$ ./xmltest 
** Demo doc read from disk: ** 

** Printing via doc.Print **
<?xml version="1.0" standalone="no" ?>
<!-- Our to do list data -->
<ToDo>
    <!-- Do I need a secure PDA? -->
    <Item priority="1" distance="close">Go to the
        <bold>Toy store!</bold>
    </Item>
    <Item priority="2" distance="none">Do bills</Item>
    <Item priority="2" distance="far & back">Look for Evil Dinosaurs!</Item>
</ToDo>
** Printing via TiXmlPrinter **
<?xml version="1.0" standalone="no" ?>
<!-- Our to do list data -->
<ToDo>
    <!-- Do I need a secure PDA? -->
    <Item priority="1" distance="close">
        Go to the
        <bold>Toy store!</bold>
    </Item>
    <Item priority="2" distance="none">Do bills</Item>
    <Item priority="2" distance="far & back">Look for Evil Dinosaurs!</Item>
</ToDo>

** Demo doc processed: ** 

<?xml version="1.0" standalone="no" ?>
<!-- Our to do list data -->
<ToDo>
    <!-- Do I need a secure PDA? -->
    <Item priority="2" distance="close">Go to the
        <bold>Toy store!</bold>
    </Item>
    <Item priority="1" distance="far">Talk to:
        <Meeting where="School">
            <Attendee name="Marple" position="teacher" />
            <Attendee name="Voel" position="counselor" />
        </Meeting>
        <Meeting where="Lunch" />
    </Item>
    <Item priority="2" distance="here">Do bills</Item>
</ToDo>
[pass] Root element exists. [1][1]
[pass] Root element value is 'ToDo'. [ToDo][ToDo]
[pass] First child exists & is a comment. [1][1]
[pass] Sibling element exists & is an element. [1][1]
[pass] Value is 'Item'. [Item][Item]
[pass] First child exists. [1][1]
[pass] Value is 'Go to the'. [Go to the][Go to the]

** Iterators. **
[pass] Top level nodes, using First / Next. [3][3]
[pass] Top level nodes, using Last / Previous. [3][3]
[pass] Top level nodes, using IterateChildren. [3][3]
[pass] Children of the 'ToDo' element, using First / Next. [3][3]
[pass] 'Item' children of the 'ToDo' element, using First/Next. [3][3]
[pass] 'Item' children of the 'ToDo' element, using Last/Previous. [3][3]
[pass] Error row [3][3]
[pass] Error column [17][17]
[pass] Query attribute: int as double [0][0]
[pass] Query attribute: int as double [1][1]
[pass] Query attribute: double as double [2][2]
[pass] Query attribute: double as int [0][0]
[pass] Query attribute: double as int [2][2]
[pass] Query attribute: not a number [2][2]
[pass] Query attribute: does not exist [1][1]
[pass] Attribute round trip. c-string. [strValue][strValue]
[pass] Attribute round trip. int. [1][1]
[pass] Attribute round trip. double. [-1][-1]
[pass] Location tracking: Tab 8: room row [1][1]
[pass] Location tracking: Tab 8: room col [49][49]
[pass] Location tracking: Tab 8: doors row [1][1]
[pass] Location tracking: Tab 8: doors col [55][55]
[pass] Location tracking: Declaration row [1][1]
[pass] Location tracking: Declaration col [5][5]
[pass] Location tracking: room row [1][1]
[pass] Location tracking: room col [45][45]
[pass] Location tracking: doors row [1][1]
[pass] Location tracking: doors col [51][51]
[pass] Location tracking: Comment row [2][2]
[pass] Location tracking: Comment col [3][3]
[pass] Location tracking: text row [3][3]
[pass] Location tracking: text col [24][24]
[pass] Location tracking: door0 row [3][3]
[pass] Location tracking: door0 col [5][5]
[pass] Location tracking: door1 row [4][4]
[pass] Location tracking: door1 col [5][5]

** UTF-8 **
[pass] UTF-8: Russian value.
[pass] UTF-8: Russian value row. [4][4]
[pass] UTF-8: Russian value column. [5][5]
[pass] UTF-8: Browsing russian element name.
[pass] UTF-8: Russian element name row. [7][7]
[pass] UTF-8: Russian element name column. [47][47]
[pass] UTF-8: Declaration column. [1][1]
[pass] UTF-8: Document column. [1][1]
[pass] UTF-8: Verified multi-language round trip. [1][1]
[pass] Legacy encoding: Verify text element. [r�sum�][r�sum�]

** Copy and Assignment **
[pass] Copy/Assign: element copy #1. [element][element]
[pass] Copy/Assign: element copy #2. [value][value]
[pass] Copy/Assign: element assign #1. [element][element]
[pass] Copy/Assign: element assign #2. [value][value]
[pass] Copy/Assign: element assign #3. [1][1]
[pass] Copy/Assign: comment copy. [comment][comment]
[pass] Copy/Assign: comment assign. [comment][comment]
[pass] Copy/Assign: unknown copy. [[unknown]][[unknown]]
[pass] Copy/Assign: unknown assign. [[unknown]][[unknown]]
[pass] Copy/Assign: text copy. [TextNode][TextNode]
[pass] Copy/Assign: text assign. [TextNode][TextNode]
[pass] Copy/Assign: declaration copy. [UTF-8][UTF-8]
[pass] Copy/Assign: text assign. [UTF-8][UTF-8]
[pass] GetText() normal use. [This is text][This is text]
[pass] GetText() contained element. [1][1]
[pass] GetText() partial. [This is ][This is ]
<xmlElement>
    <![CDATA[I am > the rules!
...since I make symbolic puns]]>
</xmlElement>
[pass] CDATA parse.
[pass] CDATA copy.
[pass] CDATA with all bytes #1.
<xmlElement>
    <![CDATA[<b>I am > the rules!</b>
...since I make symbolic puns]]>
</xmlElement>
[pass] CDATA parse. [ 1480107 ]
[pass] CDATA copy. [ 1480107 ]

** Fuzzing... **
** Fuzzing Complete. **

** Bug regression tests **
[pass] Test InsertBeforeChild on empty node. [1][1]
[pass] Test InsertAfterChild on empty node.  [1][1]
[pass] Basic TiXmlString test.  [Hello World!][Hello World!]
[pass] Entity transformation: read. 
[pass] Entity transformation: write. 
[pass] dot in element attributes and names [0][0]
[pass] Entity with one digit. [1][1]
[pass] Entity with one digit. [1.1 Start easy ignore fin thickness
][1.1 Start easy ignore fin thickness
]
[pass] Correct value of unknown. [!DOCTYPE PLAY SYSTEM 'play.dtd'][!DOCTYPE PLAY SYSTEM 'play.dtd']
[pass] Comment formatting. [ Somewhat<evil> ][ Somewhat<evil> ]
[pass] White space kept. [ This has leading and trailing space ][ This has leading and trailing space ]
[pass] White space kept. [This has  internal space][This has  internal space]
[pass] White space kept. [ This has leading, trailing, and  internal space ][ This has leading, trailing, and  internal space ]
[pass] White space condensed. [This has leading and trailing space][This has leading and trailing space]
[pass] White space condensed. [This has internal space][This has internal space]
[pass] White space condensed. [This has leading, trailing, and internal space][This has leading, trailing, and internal space]
[pass] Parsing repeated attributes. [1][1]
[pass] Embedded null throws error. [1][1]
[pass] ISO-8859-1 Parsing. [C�nt�nt�������][C�nt�nt�������]
[pass] Empty document error TIXML_ERROR_DOCUMENT_EMPTY [12][12]
[pass] Empty tinyxml string compare equal [1][1]
[pass] Empty tinyxml string compare equal [1][1]
[pass] Test safe error return. [0][0]
[pass] Low entities. [][]
<test>&#x0E;</test>
[pass] Throw error with bad end quotes. [1][1]
[pass] Document only at top level. [1][1]
[pass] Document only at top level. [15][15]
[pass] Missing end tag at end of input [1][1]
[pass] Missing end tag with trailing whitespace [1][1]
[pass] Comments ignore entities.
[pass] Comments ignore entities.
[pass] Comments iterate correctly. [3][3]
[pass] Handle end tag whitespace [0][0]
[pass] Infinite loop test. [1][1]
[pass] Odd XML parsing. [tag][tag]

Pass 109, Fail 0

yourapp5: twitcurl + tinyXML

L'exemple yourapp5 és el que tenia a yourapp2 (sense tinyxml), i l'exemple yourapp5b.cpp ja és incorporant tinyxml.

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
$ g++ -ltwitcurl yourapp5.cpp -o yourapp5
$ ./yourapp5 -u joanqc -p jq****
Please visit this link in web browser and authorize this application:
http://twitter.com/oauth/authorize?oauth_token=85Ec6IW3rRQmHsJLBarAyh3IbULPXLjARsOdoyZUrI
Enter the PIN provided by twitter: 8863516

twitterClient:: twitCurl::userGet web response:
<?xml version="1.0" encoding="UTF-8"?>
<user>
  <id>108459787</id>
  <name>Roger Buch i Ros</name>
  <screen_name>Rogerbuch</screen_name>
  <location>Barcelona</location>
  <description>Politòleg. Anàlisi electoral, associacionisme, participació, independentisme.</description>
  <profile_image_url>http://a2.twimg.com/profile_images/1423823730/roger_normal.jpg</profile_image_url>
  <profile_image_url_https>https://si0.twimg.com/profile_images/1423823730/roger_normal.jpg</profile_image_url_https>
  <url>http://blocdenroger.blogspot.com</url>
  <protected>false</protected>
  <followers_count>1111</followers_count>
  <profile_background_color>FFF04D</profile_background_color>
  <profile_text_color>333333</profile_text_color>
  <profile_link_color>0099CC</profile_link_color>
  <profile_sidebar_fill_color>f6ffd1</profile_sidebar_fill_color>
  <profile_sidebar_border_color>fff8ad</profile_sidebar_border_color>
  <friends_count>621</friends_count>
  <created_at>Tue Jan 26 00:58:41 +0000 2010</created_at>
  <favourites_count>22</favourites_count>
  <utc_offset>-36000</utc_offset>
  <time_zone>Hawaii</time_zone>
  <profile_background_image_url>http://a1.twimg.com/images/themes/theme19/bg.gif</profile_background_image_url>
  <profile_background_image_url_https>https://si0.twimg.com/images/themes/theme19/bg.gif</profile_background_image_url_https>
  <profile_background_tile>false</profile_background_tile>
  <profile_use_background_image>true</profile_use_background_image>
  <notifications>false</notifications>
  <geo_enabled>false</geo_enabled>
  <verified>false</verified>
  <following>true</following>
  <statuses_count>2621</statuses_count>
  <lang>es</lang>
  <contributors_enabled>false</contributors_enabled>
  <follow_request_sent>false</follow_request_sent>
  <listed_count>59</listed_count>
  <show_all_inline_media>false</show_all_inline_media>
  <default_profile>false</default_profile>
  <default_profile_image>false</default_profile_image>
  <is_translator>false</is_translator>
  <status>
    <created_at>Mon Jul 18 08:18:37 +0000 2011</created_at>
    <id>92870878144102400</id>
    <text>RT @consult_estudis: No parem, @xarxanetorg tornarà a tenir nou disseny! Endavant</text>
    <source><a href="http://twitter.com/#!/download/iphone" rel="nofollow">Twitter for iPhone</a></source>
    <truncated>false</truncated>
    <favorited>false</favorited>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <retweet_count>1</retweet_count>
    <retweeted>false</retweeted>
    <retweeted_status>
      <created_at>Mon Jul 18 08:16:38 +0000 2011</created_at>
      <id>92870377251934208</id>
      <text>No parem, @xarxanetorg tornarà a tenir nou disseny! Endavant</text>
      <source><a href="http://twitter.com/#!/download/iphone" rel="nofollow">Twitter for iPhone</a></source>
      <truncated>false</truncated>
      <favorited>false</favorited>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <retweet_count>1</retweet_count>
      <retweeted>false</retweeted>
      <geo/>
      <coordinates/>
      <place/>
      <contributors/>
    </retweeted_status>
    <geo/>
    <coordinates/>
    <place/>
    <contributors/>
  </status>
</user>

Per compilar yourapp5b:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

g++ -c -Wall -Wno-unknown-pragmas -Wno-format -O3   yourapp5b.cpp -o yourapp5b.o
o també
g++ -c yourapp5b.cpp -o yourapp5b.o

g++ -ltwitcurl -o yourapp5b  tinyxml.o tinyxmlparser.o yourapp5b.o tinyxmlerror.o tinystr.o
(prèviament he copiat tots els *.o a la carpeta on es troba yourapp5b)

./yourapp5b -u joanqc -p jq1732

Funciona correctament, però és important mirar-se bé la documentació de tinyxml per saber com funciona:

d3.js: Data-Driven Documents

és la llibreria que reemplaça protovis i que em permet inserir diagrames xulos en un browser. Si estic utiltizant C++, la idea és que des de C++ puc llençar una imatge en un navegador web, de manera que utilitzo el firefox com a presentació dels gràfics.

Descarrego: mbostock-d3-v1.25.0-6-g489eb0c.tar.gz


creat per Joan Quintana Compte, juliol 2011

Eines de l'usuari
Espais de noms
Variants
Accions
Navegació
IES Jaume Balmes
Màquines recreatives
CNC
Informàtica musical
joanillo.org Planet
Eines