Best practices for publishing data

This presentation is given by Hjalmar Gislason of DataMarket. Hjalmar start by showing us some of the data already uploaded on DataMarket, for instance on oil production in Libya or everything related to the Netherlands. The data can be explored and shared and most sets are free. So now, the best practices.

1) Use a simple format. Something that looks appealing and simple to a user, like this, is very hard to decipher for a computer

According to Hjalmar, csv is the best format that’s available today.

2) Indexes, unique ID’s and meta data

Indexes: By adding the last-updated date, you lower the load on your end, because otherwise users might keep fetching your feed although it did not change.

Unique ID’s: When Lehman Brothers went bankrupt, other financial institutions wanted to find out whether they were exposed, however this was extremely hard because Lehman in fact consisted of over 2000 entities.

Meta data: Share as much meta data as you can: urls, methods, keywords and especially units. Context is everything!

3) FAQs and feedback channels. The #1 reason that institutions do not want to share info is they say that there might be errors in the data set. However, this should be the main reason to release it anyway. See also the strata talk on the UK legislation database. When people are using your dataset and you have a feedback loop, people can help you.

Some useful handles for people who want to start to share their data.