Strumenti Utente

Strumenti Sito


dm:warlogs2013-14

Questa è una vecchia versione del documento!


WarLogs Dataset

The dataset contains a subset of reports concerning the Iraq war, from 2004 to 2009, published by WikiLeak on October 22, 2010.

The dataset, already cleaned and preprocessed, is made of a relational table with the following attributes:

  • report_key | text: ID of report
  • to_timestamp | timestamp: date of release of report (up to the minute)
  • Type | text: Macro-classification of events in each report
  • category | text: Specific classification of each report
  • region | text: Class of location of the event
  • attack_on | text: target of event/attack of the report
  • coalition_forces_wounded | integer: n. coalition force units wounded in the event/attack
  • coalition_forces_killed | integer: n. coalition force units killed in the event/attack
  • iraq_forces_wounded | integer: n. Iraqi force units wounded in the event/attack
  • iraq_forces_killed | integer: n. Iraqi force units killed in the event/attack
  • civilian_wia | integer: n. civilians wounded in the event/attack
  • civilian_kia | integer: n. vicilians killed in the event/attack
  • enemy_wia | integer: n. enemy units wounded in the event/attack
  • enemy_kia | integer: n. enemy units killed in the event/attack
  • enemy_detained | integer: n. enemy units captured in the event/attack
  • total_deaths | integer: total number of deaths in the event/attack
  • st_x | numeric: longitude of event/attack location
  • st_y | numeric: latitude of event/attack location

The dataset is in CVS format: warlogs.csv.zip
Here is also a small sample of data (2000 reports): warlogs2000.csv.zip

Problem

The exercise requires to perform two distinct clustering analsyses on the dataset:

  • one aimed to group events based on their impact on the population and on the forces involved – casualties, captured or wounded units, etc.
  • the oher aimed to group events based on their location, in order to discover geographic areas where events are more dense. Optionally, the temporal dimension can be involved in the process (e.g. to split the dataset or directly as additional attribute in the clustering).

Each cluster should be properly explored and characterize in comparison with the others.

dm/warlogs2013-14.1387456999.txt.gz · Ultima modifica: 19/12/2013 alle 12:43 (11 anni fa) da Mirco Nanni

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki