Commit 68b307c4 authored by guglielmo's avatar guglielmo

initial commit

This project contains source code used for automatic classification of the texts found in
the italian Parliament acts, from 16th legislature.
Mysql dump of the 16th legislature (anonymized users data)
Mysql dump of the 17th legislature (anonymized users data)
Database ER design
See files ``docs/opp_model.png`` e ``docs/opp_model.mwb`` (mysql workbench)
the ER schema, as a PNG low-res image,
the ER schema, as a Mysql Workbench file,
the sql queries used for the views extracting texts and categories.
Download the sql dump, decompress and restore in a mysql instance:
mysql -uroot -e "create database opp16 default charset utf8;"
gzip opp16_anonym.sql | mysql -uroot opp16
Use the views to extract the texts or categories for each act:
select * from atto_texts;
select * from atto_tags;
Build test and training set using this extractions directly (sqlalchemy) or indirectly (csvexport + pandas, other means).
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment