Commit 68b307c4 authored by guglielmo's avatar guglielmo

initial commit

parents
*.sql
.DS_Store
This project contains source code used for automatic classification of the texts found in
the italian Parliament acts, from 16th legislature.
References
==========
Mysql dump of the 16th legislature (anonymized users data)
https://s3.amazonaws.com/op_backup/opp16_anonym.sql.gz
Mysql dump of the 17th legislature (anonymized users data)
https://s3.amazonaws.com/op_backup/opp17_anonym.sql.gz
Database ER design
See files ``docs/opp_model.png`` e ``docs/opp_model.mwb`` (mysql workbench)
Contents
========
docs:
the ER schema, as a PNG low-res image,
the ER schema, as a Mysql Workbench file,
the sql queries used for the views extracting texts and categories.
Usage
=====
Download the sql dump, decompress and restore in a mysql instance:
wget https://s3.amazonaws.com/op_backup/opp16_anonym.sql.zip
mysql -uroot -e "create database opp16 default charset utf8;"
gzip opp16_anonym.sql | mysql -uroot opp16
Use the views to extract the texts or categories for each act:
select * from atto_texts;
select * from atto_tags;
Build test and training set using this extractions directly (sqlalchemy) or indirectly (csvexport + pandas, other means).
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment