GovWild Logo

 



Architecture

The integration process is implemented in a scalable Hadoop environment with JAQL as query language for JSON objects, a semi-structured document-oriented storage format. Hadoop is an open-source framework offering automatically parallelism of map and reduce phases on a cluster.

The IBM engine SystemT with its declarative rule language AQL allows to extract information out of unstructured content. Google Refine is a tool for working with messy data, cleaning it up, and transforming it from one format into another. The Duplicate Detection (DuDe) toolkit of the Information Systems Research Group at Hasso Plattner Institut supports the search for duplicates in a variety of data sources. The Information Workbench (IWB) is a self-service platform for linked data application development of our cooperation partner fluid Operations. An exporter converts the JSON objects to RDF triples to visualize the data with the help of the IWB.



architecture