OAI PMH details

One of our electronic data feeds uses the OAI-PMH protocol. (See here for an introduction to Culture24 data feeds) The OAI-PMH protocol itself is described at www.openarchives.org/OAI/openarchivesprotocol.html, and there are lots of tutorials and toolkits out there to help if you’re new to OAI.

We have two OAI feeds, a summary feed and a full feed. Our data is split up into records (articles, events, venues, etc), with each record split into fields (description, longitude, latitude, title, opening hours, etc.).

The difference between our summary data feed and our full data feed is just the number of fields included with each record. In our OAI feeds, data is split up into sets: each set contains records of a particular type, so our OAI Sets are: resources, articles, events, websites, and venues.

Detailed descriptions of the data fields here.


Let’s play!
OK – time to play. OAI data feeds aren’t intended for direct human consumption, but they just ship up XML over HTTP, so you can point your web browser at the feed and you’ll be able to see recognizable stuff in the XML which you get back.

The Culture24 summary OAI-PMH feed is at: http://www.culture24.org.uk/api/oai/open. (This link isn’t live because you need to complete the url by specifying what you need to find. See below for some examples).

The OAI protocol works by specifying what you need to add on the end of this base URL. Here are some examples – try them out:

http://www.culture24.org.uk/api/oai/open?verb=Identify will return some basic technical information about the feed, including, for example, who to contact if you have any questions

http://www.culture24.org.uk/api/oai/open?verb=ListMetadataFormats asks the server which different record formats it supports (OAI-PMH calls these metadata formats). If you run this request, you’ll be able to see that the Culture24 OAI server supports two formats: the oai_dc format, and the c24 format. Oai_dc is a Dublin Core version of the records – supporting that is a basic requirement of OAI: all OAI-PMH servers need to support that. c24 is the morenative format where you’ll see our data with more detail and precision.

http://www.culture24.org.uk/api/oai/open?verb=ListSets will ask the server to list all the sets it knows about. It will respond saying it knows about events, venues, articles, websites, and resources

http://www.culture24.org.uk/api/oai/open?verb=ListRecords&metadataPrefix=c24&set=resources. Right: finally, let’s get some real data! This says: “I want your resources, and I’d like them in Culture24 format.” You’ll get a whole bunch of (quite small) records back. And at the end, you’ll see something called a Resumption Token. This is OAI’s way of saying: “I’ve got lots of records here: here are the first lot; if you want more, post this resumption token back and I’ll pick up where I left off.” This is a really efficient way to transmit big datasets – your XML parser won’t fall over because it had a huge document to parse; on the other hand you don’t have to take records one by one, which can be really slow (though OAI does provide a way to do that if you want). While you’re in there, notice that every record has a little header section, which includes a date stamp (last modified date), and an identifier. The identifier is guaranteed to be unique to this record. So you can store that in your system and rely on Culture24 using the same identifier next time you happen across the same record. Check out the description of the protocol at http://www.openarchives.org/OAI/openarchivesprotocol.html for more details of what you can do with OAI-PMH.


Full and Redacted feeds
So far we’ve been looking at the open feed. We also publish a full feed, which includes more fields, including, for example, the full text of our news articles, and lots of other stuff. To get access to the full feed, you need to get a key.Contact us to find out more.

Meanwhile, to allow you to get going immediately, we have a special version of the full feed which we call the redacted feed. This is exactly the same as the full feed, except some of the data has been overwritten with underscore characters.

The redacted feed is at http://www.culture24.org.uk/api/oai/redacted. When you get a real key from us, just put it at the end of the URL in place of the word “redacted”. For example, if your key turns out to be GxYX17abc23, then your full feed URL will be http://www.culture24.org.uk/api/oai/KGxYX17abc23. You can try out all the URLs above on the redacted feed. Check out, for example, http://www.culture24.org.uk/api/oai/redacted?verb=ListRecords&metadataPrefix=c24&set=resources.