Following on from our previous posts on the right to attend, report and record local council meetings, the Department for Communities and Local Government has announced that it will be changing the law to further open up council meetings.
In particular, the new regulations, which come into effect September 10, will reduce the permitted reasons for holding meetings behind closed doors and has widened the definition of media to cover “organisations that… provide news to the public by means of the internet”, thus including hyperlocal blogs.
Will Perrin at TalkAboutLocal has a good summary of some of the other issues in the announcement, but this is important news for hyperlocal bloggers, particularly those that have faced obstruction and restrictions from their local councils.
This post is by Andrew Speakman, who’s coordinating OpenlyLocal‘s planning application work.
We can now report good progress on our plan to develop community scrapers to underpin the new incarnation of PlanningAlerts.com. The plan applies to those councils that use non-standard planning systems and our latest estimate is that there are around 100 of these sites.
There are now 13 successful working scrapers with more on the way and some of the £75 bounties have already been paid out. The data from these scrapers is being regularly uploaded into OpenlyLocal using the Scraperwiki API, and you can see the results here with Crawley Council:
The list of the authorities being scraped by this method is as follows:
- East Sussex
- Isle of Wight
- Nuneaton and Bedworth
- Telford and Wrekin
This is still very much a work in progress, and those in the above list that we’ve linked to are all running well and helping collect up-to-date planning applications with locations, and as soon as we turn on the alerts system (currently being tested), will start sending out email alerts.
There are also some councils that although we’re importing the data for we won’t yet be able to send alerts for – an example is Wokingham from the above list – because they do not include postcodes in the planning application details and our location coding is based on postcodes to ensure our data is fully open. If anyone from authorities such as Wokingham wants to rectify this situation, we’re more than happy to work with them.
If you want to get involved in helping scrape the UK’s planning data and building an open database of planning applications for the whole of the UK, contact me at firstname.lastname@example.org. Further details about data fields and the available planning authorities are defined in this shared Google spreadsheet.
This post is by Andrew Speakman, who’s coordinating OpenlyLocal’s planning application work.
As Chris wrote in his last post announcing OpenlyLocal’s progress in building an open database of planning applications, while we can do the importing from the main planning systems, if we’re really going to cover the whole country, we’re going to need the community’s help. I’m going to be coordinating this effort and so I thought it would be useful to explain how we’re going to do this (you can contact me at email@example.com).
First, we’re going to use the excellent ScraperWiki as the main platform for writing external scrapers. It supports Python, Ruby and PHP, and has worked well for similar schemes. It also means the scraper is openly available and we can see it in action. We will then use the Scraperwiki API to upload the data regularly into OpenlyLocal.
Second, we’re going to break the job into manageable chunks by focus on target groups of councils, and just to sweeten things – as if building a national open database of planning applications wasn’t enough – we’re going to offer small bounties (£75) for successful scrapers for these councils.
We have some particular requirements designed to make the system maintainable, and do things the right way, but not many are fixed in stone, so feel free to respond with suggestions if you want to do it in a different way.
For example, the scraper should keep itself current (running on a daily basis), but also behave nicely (not putting an excessive load on Scraperwiki or the target website by trying to get too much data in one go). In addition we propose that the scrapers should operate by updating current applications on a daily basis and also make inroads into the backlog by gathering a batch of previous applications.
- Create new database records for any new applications that have appeared on the site since the last run and store the identifiers (uid and url).
- Create new database records of a batch of missing older applications and store the identifiers (uid and url). Currently the scrapers are set up to work backwards from the earliest stored application towards a target date in the past
- Update the most current applications by collecting and saving the full application details. At the moment the scrapers update the details of all applications from the past 60 days.
- Update the full application details of a batch of older applications where the uid and url has been collected (as above) but the application details are missing. At the moment the scrapers work backwards from the earliest “empty” application towards a target date in the past
The data fields to be gathered for each planning application are defined in this shared Google spreadsheet. Not all the fields will be available on every site, but we want all those that are there.
Note the following:
- The minimal valid set of fields for an application is: ‘uid’, ‘description’, ‘address’, ‘start_date’ and ‘date_scraped’
- The ‘uid’ is the database primary key field
- All dates (except date_scraped) should be stored in ISO8601 format
- The ‘start_date’ field is set to the earliest of the ‘date_received’ or ‘date_validated’ fields, depending on which is available
- The ‘date_scraped’ field is a date/time (RFC3339) set to the current time when the full application details are updated. It should be indexed.
So how do you get started? Here’s a list of 10 non-standard authorities that you can choose from. Aberdeen, Aberdeenshire, Ashfield, Bath, Calderdale, Carmarthenshire, Consett, Crawley, Elmbridge, Flintshire. Have a look at the sites and then let me know if you want to reserve one and how long you think it will take to write your scraper.
Well, that took a little longer than planned…
[I won't go into the details, but suffice to say our internal deadline got squeezed between the combination of a fast-growing website, the usual issues of large datasets, and that tricky business of finding and managing coders who can program in Ruby, get data, and be really good at scraping tricky websites.]
But I’m pleased to say we’ve now well on our way to not just resurrecting PlanningAlerts in a sustainable, scalable way but a whole lot more too.
Where we’re heading: a open database of UK planning applications
First, let’s talk about the end goal. From the beginning, while we wanted to get PlanningAlerts working again – the simplicity of being able to put in your postcode and email address and get alerts about nearby planning applications is both useful and compelling – we also knew that if the service was going to be sustainable, and serve the needs of the wider community we’d need to do a whole lot more.
Particularly with the significant changes in the planning laws and regulations that are being brought in over the next few years, it’s important that everybody – individuals, community groups, NGOs, other websites, even councils – have good and open access to not just the planning applications in their area, but in the surrounding areas too.
In short, we wanted to create the UK’s first open database of planning applications, free for reuse by all.
That meant not just finding when there was a planning application, and where (though that’s really useful), but also capturing all the other data too, and also keep that information updated as the planning application went through the various stages (the original PlanningAlerts just scraped the information once, when it was found on the website, and even then pretty much just got the address and the description).
Of course, were local authorities to publish the information as open data, for example through an API, this would be easy. As it is, with a couple of exceptions, it means an awful lot of scraping, and some pretty clever scraping too, not to mention upgrading the servers and making OpenlyLocal more scalable.
Where we’ve got to
Still, we’ve pretty much overcome these issues and now have hundreds of scrapers working, pulling the information into OpenlyLocal from well over a hundred councils, and now have well over half a million planning applications in there.
There are still some things to be sorted out – some of the council websites seem to shut down for a few hours overnight, meaning they appear to be broken when we visit them, others change URLs without redirecting to the new ones, and still others are just, well, flaky. But we’ve now got to a stage where we can start opening up the data we have, for people to play around with, find issues with, and start to use.
For a start, each planning application has its own permanent URL, and the information is also available as JSON or XML:
There’s also a page for each council, showing the latest planning applications, and the information here is available via the API too:
There’s also a GeoRSS feed for each council too allowing you to keep up to date with the latest planning applications for your council. It also means you can easily create maps or widgets for the council, showing the latest applications of the council.
Finally, Andrew Speakman, who’d coincidentally been doing some great stuff in this area, has joined the team as Planning editor, to help coordinate efforts and liaise with the community (more on this below).
The next main task is to reinstate the original PlanningAlert functionality. That’s our focus now, and we’re about halfway there (and aiming to get the first alerts going out in the next 2-3 weeks).
We’ve also got several more councils and planning application systems to add, and this should bring the number of councils we’ve got on the system to between 150 and 200. This will be an ongoing process, over the next couple of months. There’ll also be some much-overdue design work on OpenlyLocal so that the increased amount of information on there is presented to the user in a more intuitive way – please feel free to contact us if you’re a UX person/designer and want to help out.
We also need to improve the database backend. We’ve been using MySQL exclusively since the start, but MySQL isn’t great at spatial (i.e. geographic) searches, restricting the sort of functionality we can offer. We expect to sort this in a month or so, probably moving to PostGIS, and after that we can start to add more features, finer grained searches, and start to look at making the whole thing sustainable by offering premium services.
We’ll be working too on liaising with councils who want to offer their applications via an API – as the ever pioneering Lichfield council already does – or a nightly data dump. This not only does the right thing in opening up data for all to use, but also means we don’t have to scrape their websites. Lichfield, for example, uses the Idox system, and the web interface for this (which is what you see when you look at a planning application on Lichfield’s website) spreads the application details over 8 different web pages, but the API makes this available on a single URL, reducing the work the server has to do.
Finally, we’re going to be announcing a bounty scheme for the scraper/developer community to write scrapers for those areas that don’t use one of the standard systems. Andrew will be coordinating this, and will be blogging about this sometime in the next week or so (and you can contact him at planning at openlylocal dot com). We’ll also be tweeting progress at @planningalert.
Thanks for your patience.
As you may have read from our previous post, OpenlyLocal is taking over PlanningAlerts.com to enable this valuable service not just to continue, but also develop into a much richer community resource.
We’ve already started work on this, but we need a Ruby programmer to help us with this (and maybe much more). This is what we’re looking for:
- Great Ruby skills, and solid experience with Ruby on Rails (OpenlyLocal is a Rails app)
- Experience with Test Driven Development
- An all-rounder with some Linux sysadmin skills
- Have done enough scraping to be able to handle those pesky .asp pages
- Eagerness to work with open data
One of the first and best examples of how data could make a difference to ordinary people’s lives was the inspirational PlanningAlerts.com, built by Richard Pope, Mikel Maron, Sam Smith, Duncan Parkes, Tom Hughes and Andy Armstrong.
In doing one simple thing – allowing ordinary people to subscribe to an email alert when there was a planning application near them, regardless of council boundaries – it showed that data mattered, and more than that had the power to improve the interaction between government and the community.
It did so many revolutionary things and fought so many important battles that everyone in the open data world (and not just the UK) owes all those who built it a massive debt of gratitude. Richard Pope and Duncan Parkes in particular put masses of hours writing scrapers, fighting the battle to open postcodes and providing a simple but powerful user experience.
However, over the past year it had become increasingly difficult to keep the site going, with many of the scrapers falling into disrepair (aka scraper rot). Add to that the demands of a day job, and the cost of running a server, and it’s a tribute to both Richard and Duncan that they kept PlanningAlerts going for as long as they did.
So when Richard reached out to OpenlyLocal and asked if we were interested in taking over PlanningAlerts we were both flattered and delighted. Flattered and delighted, but also a little nervous. Could we take this on in a sustainable manner, and do as good a job as they had done?
Well after going through the figures, and looking at how we might architect it, we decided we could – there were parts of the problem that were similar to what we were already doing with OpenlyLocal – but we’d need to make sustainability a core goal right from the get-go. That would mean a business plan, and also a way for the community to help out.
Both of those had been given thought by both us and by Richard, and we’d come to pretty much identical ideas, using a freemium model to generate income, and ScraperWiki to allow the community help with writing scrapers, especially for those councils didn’t use one of the common systems. But we also knew that we’d need to accelerate this process using a bounty model, such as the one that’s been so successful for OpenCorporates.
Now all we needed was the finance to kick-start the whole thing, and we contacted Nesta to see if they were interested in providing seed funding by way of a grant. I’ve been quite critical of Nesta’s processes in the past, but to their credit they didn’t hold this against us, and more than that showed they were capable and eager to working in a fast, lightweight & agile way.
We didn’t quite manage to get the funding or do the transition before Richard’s server rental ran out, but we did save all the existing data, and are now hard at work building PlanningAlerts into OpenlyLocal, and gratifyingly making good progress. The PlanningAlerts.com domain is also in the middle of being transferred, and this should be completed in the next day or so.
We expect to start displaying the original scraped planning applications over the next few weeks, and have already started work on scrapers for the main systems used by councils. We’ll post here, and on the OpenlyLocal and PlanningAlert twitter accounts as we progress.
We’re also liaising with PlanningAlerts Australia, who were originally inspired by PlanningAlerts UK, but have since considerably raised the bar. In particular we’ll be aiming to share a common data structure with them, making it easy to build applications based on planning applications from either source.
Like buses, you wait ages for local councils to publish their spending data, then a whole load come at once… and consequently OpenlyLocal has been importing the data pretty much non-stop for the past month or so.
We’ve now imported spending data for over 140 councils with more being added each day, and now have over a million and a half payments to suppliers, totalling over £10 billion. I think it’s worth repeating that figure: Ten Billion Pounds, as it’s a decent chunk of change, by anybody’s measure (although it’s still only a fraction of all spending by councils in the country).
Along with that we’ve also made loads of improvements to the analysis and data, some visible, other not so much (we’ve made loads of much-needed back-end improvements now that we’ve got so much more data), and to mark breaking the £10bn figure I thought it was worth starting a series of posts looking at the spending dataset.
Let’s start by having a look at those headline figures (we’ll be delving deeper into the data for some more heavyweight data-driven journalism over the next few weeks):
144 councils. That’s about 40% of the 354 councils in England (including the GLA). Some of the others we just haven’t yet imported (we’re adding them at about 2 a day); others have problems with the CSV files they are publishing (corrupted or invalid files, or where there’s some query about the data itself), and where there’s a contact email we’ve notified them of this.
The rest are refusing to publish the CSV files specified in the guidelines, deciding to make it difficult to automatically import by publishing an Excel file or, worse, a PDF (and here I’d like to single out Birmingham council, the biggest in the UK, which shamefully is publishing it’s spending only as a PDF, and even then with almost no detail at all. One wonders what they are hiding).
£10,184,169,404 in 1,512,691 transactions. That’s an average transaction value of £6,732 per payment. However this is not uniform across councils, varying from an average transaction value of £669 for Poole to £46,466 for Barnsley. (In future posts, I’ll perhaps have a look at using the R statistical language to do some histograms on the data, although I’d be more than happy if someone beat me to that).
194,128 suppliers. What does this mean? To be accurate, this is the total number of supplying relationships between the councils and the companies/people/things they are paying.
Sometimes a council may have (or appear to have) several supplier relationships with the same company (charity/council/police authority), using different names or supplier IDs. This is sometimes down to a mistake in keying in the data, or for internal reasons, but either way it means several supplier records are created. It’s also worth noting that redacted payments are often grouped together as a single ‘supplier’, as the council may not have given any identifier to show that a redacted payment of £50,000 to a company (and in general there’s little reason to redact such payments) is to a different recipient than a redacted payment of £800 to a foster parent, for example.
However, using some clever matching and with the help of the increasing number of users who are matching suppliers to companies/charities and other entities on OpenlyLocal (just click on ‘add info’ when you’re looking at a supplier you think you can match to a company or charity)., we’ve matched about 40% of these to real-world organisations such as companies and charities.
While that might not seem very high, a good proportion of the rest will be sole-traders, individuals, or organisations we’ve not yet got a complete list of (Parish and Town councils, for example). And what it does mean is we can start to get a first draft of who supplies local government. And this is what we’ve got:
66,165 companies, with total payments of £3,884,271,203 (£3.88 billion), 38.1% of the total £10bn, in 579,518 transactions, making an average payment of £6,702.
8,236 charities, with total payments of £415,878,177, 4.1% of the total, in 55,370 transactions, making an average payment of £7,511.
Next time, we’ll look at the company suppliers in a little more detail, and later on the charities too, but for the moment, as you can see we’re listing the top 20 matched indivudual companies and charities that supply local government. Bear in mind a company like Capita does business with councils through a variety of different companies, and there’s no public dataset of the relationships between the companies, but that’s another story.
Finally, the whole dataset is available to download as open data under the same share-alike attribution licence as the rest of OpenlyLocal, including the matches to companies/charities that are receiving the money (the link is at the bottom of the Council Spending Data Dashboard). Be warned, however, it’s a very big file (there’s a row for every transaction), and so is too big for Excel (or even Google Fusion tables for that matter), so it’s most use to those using a database, or doing academic research.
* Note: there are inevitably loads of caveats to this data, including that councils are (despite the guidance) publishing the data in different ways, including, occasionally, aggregating payments, and using over-aggressive redaction. It’s also, obviously, only 40% of the councils in England., although that’s a pretty big sample size. Finally there may be errors both in the data as published, and in the importing of it. Please do let us know at firstname.lastname@example.org if you see any errors, or figures that just look wrong.