The wealth of freely available, structured information on the Web is constantly growing. This is especially true for public data from and about governments and administrations. Data-providing projects, such as DBPedia and Freebase from the linked open data community, as well as structured data from domain-specific sites, such as senate.gov, USASpending.gov, or epp.eurostat.ec.europa.eu, make it possible to integrate data from multiple sources and thus create new data sets with added value. The recent appointment of Tim Berners-Lee to lead a review on how the UK government can open up access to official information reinforces this trend. However, the integration of such data sources is far from trivial: Apart from technical difficulties of accessing the data, structural and semantic differences in the data must be overcome. In particular, the various data sets must be standardized, transformed to a common structure, cleaned and finally consolidated into a single, consistent and complete data set.
GovWILD started as a joint project between Hasso Plattner Institute and IBM's Almaden Research Lab. It integrates Open Government Data about politicians, parties, government agencies, funds, companies, and industrial leaders into a clean and consistent data set. Individual components extract data, scrub it, identify common entities across multiple sources, transform data to a common structure and finally fuse conflicting data into a value-added and rich data set. We have already integrated data from several EU and US sources. This interlinked data is visualized on a Web interface to be explored by citizens and is available for download and further analysis. It can be used to uncover hidden connections between individuals in government and industry, to aggregate financial data, and to deep-dive into the network of politics and industry.
|Information Systems Group||hpi.uni-potsdam.de/naumann/home|
|Linked Open Data Cloud||richard.cyganiak.de/2007/10/lod|