This post is by Andrew Speakman, who’s coordinating OpenlyLocal’s planning application work.
As Chris wrote in his last post announcing OpenlyLocal’s progress in building an open database of planning applications, while we can do the importing from the main planning systems, if we’re really going to cover the whole country, we’re going to need the community’s help. I’m going to be coordinating this effort and so I thought it would be useful to explain how we’re going to do this (you can contact me at firstname.lastname@example.org).
First, we’re going to use the excellent ScraperWiki as the main platform for writing external scrapers. It supports Python, Ruby and PHP, and has worked well for similar schemes. It also means the scraper is openly available and we can see it in action. We will then use the Scraperwiki API to upload the data regularly into OpenlyLocal.
Second, we’re going to break the job into manageable chunks by focus on target groups of councils, and just to sweeten things – as if building a national open database of planning applications wasn’t enough – we’re going to offer small bounties (£75) for successful scrapers for these councils.
We have some particular requirements designed to make the system maintainable, and do things the right way, but not many are fixed in stone, so feel free to respond with suggestions if you want to do it in a different way.
For example, the scraper should keep itself current (running on a daily basis), but also behave nicely (not putting an excessive load on Scraperwiki or the target website by trying to get too much data in one go). In addition we propose that the scrapers should operate by updating current applications on a daily basis and also make inroads into the backlog by gathering a batch of previous applications.
- Create new database records for any new applications that have appeared on the site since the last run and store the identifiers (uid and url).
- Create new database records of a batch of missing older applications and store the identifiers (uid and url). Currently the scrapers are set up to work backwards from the earliest stored application towards a target date in the past
- Update the most current applications by collecting and saving the full application details. At the moment the scrapers update the details of all applications from the past 60 days.
- Update the full application details of a batch of older applications where the uid and url has been collected (as above) but the application details are missing. At the moment the scrapers work backwards from the earliest “empty” application towards a target date in the past
The data fields to be gathered for each planning application are defined in this shared Google spreadsheet. Not all the fields will be available on every site, but we want all those that are there.
Note the following:
- The minimal valid set of fields for an application is: ‘uid’, ‘description’, ‘address’, ‘start_date’ and ‘date_scraped’
- The ‘uid’ is the database primary key field
- All dates (except date_scraped) should be stored in ISO8601 format
- The ‘start_date’ field is set to the earliest of the ‘date_received’ or ‘date_validated’ fields, depending on which is available
- The ‘date_scraped’ field is a date/time (RFC3339) set to the current time when the full application details are updated. It should be indexed.
So how do you get started? Here’s a list of 10 non-standard authorities that you can choose from. Aberdeen, Aberdeenshire, Ashfield, Bath, Calderdale, Carmarthenshire, Consett, Crawley, Elmbridge, Flintshire. Have a look at the sites and then let me know if you want to reserve one and how long you think it will take to write your scraper.
Well, that took a little longer than planned…
[I won't go into the details, but suffice to say our internal deadline got squeezed between the combination of a fast-growing website, the usual issues of large datasets, and that tricky business of finding and managing coders who can program in Ruby, get data, and be really good at scraping tricky websites.]
But I’m pleased to say we’ve now well on our way to not just resurrecting PlanningAlerts in a sustainable, scalable way but a whole lot more too.
Where we’re heading: a open database of UK planning applications
First, let’s talk about the end goal. From the beginning, while we wanted to get PlanningAlerts working again – the simplicity of being able to put in your postcode and email address and get alerts about nearby planning applications is both useful and compelling – we also knew that if the service was going to be sustainable, and serve the needs of the wider community we’d need to do a whole lot more.
Particularly with the significant changes in the planning laws and regulations that are being brought in over the next few years, it’s important that everybody – individuals, community groups, NGOs, other websites, even councils – have good and open access to not just the planning applications in their area, but in the surrounding areas too.
In short, we wanted to create the UK’s first open database of planning applications, free for reuse by all.
That meant not just finding when there was a planning application, and where (though that’s really useful), but also capturing all the other data too, and also keep that information updated as the planning application went through the various stages (the original PlanningAlerts just scraped the information once, when it was found on the website, and even then pretty much just got the address and the description).
Of course, were local authorities to publish the information as open data, for example through an API, this would be easy. As it is, with a couple of exceptions, it means an awful lot of scraping, and some pretty clever scraping too, not to mention upgrading the servers and making OpenlyLocal more scalable.
Where we’ve got to
Still, we’ve pretty much overcome these issues and now have hundreds of scrapers working, pulling the information into OpenlyLocal from well over a hundred councils, and now have well over half a million planning applications in there.
There are still some things to be sorted out – some of the council websites seem to shut down for a few hours overnight, meaning they appear to be broken when we visit them, others change URLs without redirecting to the new ones, and still others are just, well, flaky. But we’ve now got to a stage where we can start opening up the data we have, for people to play around with, find issues with, and start to use.
For a start, each planning application has its own permanent URL, and the information is also available as JSON or XML:
There’s also a page for each council, showing the latest planning applications, and the information here is available via the API too:
There’s also a GeoRSS feed for each council too allowing you to keep up to date with the latest planning applications for your council. It also means you can easily create maps or widgets for the council, showing the latest applications of the council.
Finally, Andrew Speakman, who’d coincidentally been doing some great stuff in this area, has joined the team as Planning editor, to help coordinate efforts and liaise with the community (more on this below).
The next main task is to reinstate the original PlanningAlert functionality. That’s our focus now, and we’re about halfway there (and aiming to get the first alerts going out in the next 2-3 weeks).
We’ve also got several more councils and planning application systems to add, and this should bring the number of councils we’ve got on the system to between 150 and 200. This will be an ongoing process, over the next couple of months. There’ll also be some much-overdue design work on OpenlyLocal so that the increased amount of information on there is presented to the user in a more intuitive way – please feel free to contact us if you’re a UX person/designer and want to help out.
We also need to improve the database backend. We’ve been using MySQL exclusively since the start, but MySQL isn’t great at spatial (i.e. geographic) searches, restricting the sort of functionality we can offer. We expect to sort this in a month or so, probably moving to PostGIS, and after that we can start to add more features, finer grained searches, and start to look at making the whole thing sustainable by offering premium services.
We’ll be working too on liaising with councils who want to offer their applications via an API – as the ever pioneering Lichfield council already does – or a nightly data dump. This not only does the right thing in opening up data for all to use, but also means we don’t have to scrape their websites. Lichfield, for example, uses the Idox system, and the web interface for this (which is what you see when you look at a planning application on Lichfield’s website) spreads the application details over 8 different web pages, but the API makes this available on a single URL, reducing the work the server has to do.
Finally, we’re going to be announcing a bounty scheme for the scraper/developer community to write scrapers for those areas that don’t use one of the standard systems. Andrew will be coordinating this, and will be blogging about this sometime in the next week or so (and you can contact him at planning at openlylocal dot com). We’ll also be tweeting progress at @planningalert.
Thanks for your patience.
A couple of months ago, I blogged about the ridiculous situation of a local councillor being hauled up in front of the council’s standards committee for posting a council webcast onto YouTube, and worse, being found against (note: this has since been overturned by the First Tier Tribunal for Local Government Standards, but not without considerable cost for the people of Brighton).
At the time I said we should make the following demand:
Give the public the right to record any council meeting using any device using Flip cams, tape recorders, frankly any darned thing they like as long as it doesn’t disrupt the meeting.
Step forward councillor Liam Maxwell from the Royal Borough of Windsor & Maidenhead, who as the cabinet member for transparency has a personal mission to make RBWM the most transparent council in the country. I don’t see why you couldn’t do that our council, he said.
So, last night, I headed over to Maidenhead for the scheduled council meeting to test this out, and either provide a shining example for other councils, or show that even the most ‘transparent’ council can’t shed the pomposity and self-importance that characterises many council meetings, and allow proper open access.
The video below, less than two minutes long, is the result, and as you can see, they chose the latter course:
Interestingly, when asked why videoing was not allowed, they claimed ‘Data Protection’, the catch-all excuse for any public body that doesn’t want to publish, or open up, something. Of course, this is nonsense in the context of a public meeting, and where all those being filmed were public figures who were carrying out a civic responsibility.
There’s also an interesting bit to the end when a councillor answered that they were ‘transparent’ in response to the observation that they were supposed to be open. This is the same old you-can-look-but-don’t touch attitude that has characterised much of government’s interactions with the public (and works so well at excluding people from the process). Perhaps naively, I was a little shocked to hear this from this particular council.
So there you have it. That, I guess, is where the boundaries of transparency lies at Windsor & Maidenhead. Why not test them out at your council, and perhaps we can start a new scoreboard at OpenlyLocal to go with the open data scoreboard, and the 10:10 council scoreboard
I took a very frustrating phone call earlier today from NESTA, an organisation I’ve not had any dealings with it before, and don’t actually have a view about it, or at least didn’t.
It followed from an email I’d received a couple of days earlier, which read:
I am contacting you about a project NESTA are currently working on in partnership with the Big Society Network called Your Local Budget.
Working with 10 pioneer local authorities, we are looking at how you can use participatory budgeting to develop new ways to give people a say in how mainstream local budgets are spent. Alongside this we will also be developing an online platform that enables members of the public to understand and scrutinise their local authority’s spending, and connect with each other to generate ideas for delivering better value for money in public spending.
We would like to share our thinking and get your thoughts on the online tool to get a sense of what is needed and where we can add value. You are invited to a round table discussion on Friday 19 November, 11am – 12.30pm at NESTA that will be chaired by Philip Colligan, Executive Director of the Public Services Lab. Following the meeting we intend to issue an invitation to tender for the online tool.
Apart from the short notice & terrible timing (it clashes with the Open Government Data Camp, to which you’d hope most of the people involved would be going), the main question I had was this:
I got the phone call because I couldn’t make the round table, and for some feedback, and this was the feedback I gave: I don’t understand why this is being done. At all.
Putting aside the participatory budgeting part (although this problem seems to be getting dealt with by Redbridge council and YouGov, whose solution is apparently being offered to all councils), there’s the question of the “online platform that enables members of the public to understand and scrutinise their local authority’s spending, and connect with each other to generate ideas for delivering better value for money in public spending.“
Excuse me? Most of the data hasn’t been published yet, there are several known organisations and groups (including OpenlyLocal) that have publicly stated they going to to be importing this data and doing things with it – visualising it, and allowing different views and analysis. Additionally, OpenlyLocal is already talking with several newspaper groups to help them re-use the data, and we are constantly evolving how we match and present the data.
Despite this, Nesta seems to have decided that it’s going to spend public money on coming up with a tendered solution to solve a problem that may be solved for zero cost by the private sector. Now I’m no roll-back-the-government red-in-tooth-and-claw free marketeer, but this is crazy, and I said as much to the person from Nesta.
Is the roundtable to decide whether the project should be done, or what should be done? I asked. The latter I was told. So, they’ve got some money and have decided they’re going to spend it, even though the need may not be there. At a time when welfare payments are being cut, essential services are being slashed, for this sort of thing to happen is frankly outrageous.
There are other concerns here too – I personally think websites such as this are not suitable for a tender process, as that doesn’t encourage or often even allow the sort of agile, feedback-led process that produces the best websites. They also favour those who make their living by tendering.
So, Nesta, here’s a suggestion. Park this idea for 12 months, and in the meantime give the money back to the government. If you want to act as an angel funding then act as such (and the ones I’ve come across don’t do tendering). A reminder, your slogan is ‘making innovation flourish’, but sometimes that means stepping back and seeing what happens. This is not the way to building a Big Society
Since OpenlyLocal started pulling in council spending data, it’s niggled at me that it’s only half the story. Yes, as more and more data is published you’re beginning to get a much clearer idea of who’s paid what. And if councils publish it at a sufficient level of detail and consistently categorised, we’ll have a pretty good idea of what it’s spent on too.
However, useful though that is, that’s like taking a peak at a company’s bank statement and thinking it tells the whole story. Many of the payments relate to goods or services delivered some time in the past, some for things that have not yet been delivered, and there are all sorts of things (depreciation, movements between accounts, accruals for invoices not yet received) that won’t appear on there.
That’s what the council’s accounts are for — you know, those impenetrable things locked up in PDFs in some dusty corner of the council’s website, all sufficiently different from each other to make comparison difficult:
For some time, the holy grail for projects like OpenlyLocal and Where Does My Money Go has been to get the accounts in a standardized form to make comparison easy not just for accountants but for regular people too.
The thing is, such a thing does exist, and it’s sent by councils to central Government (the Department for Communities and Local Government to be precise) for them to use in their own figures. It’s a fairly hellishly complex spreadsheet called the Revenue Outturn form that must be filled in by the council (to get an idea have a look at the template here).
They’re not published anywhere by the DCLG, but they contain no state secrets or sensitive information; it’s just that the procedure being followed is the same one as they’ve always followed, and so they are not published, even after the statistics have been calculated from the data (the Statistics Act apparently prohibit publication until the stats have been published).
So I had an idea: wouldn’t it be great if we could pull the data that’s sitting in all these spreadsheets into a database and so allow comparison between councils’ accounts, thus freeing it from those forgotten corners of government computers.
This would seem to be a project that would be just about simple enough to be doable (though it’s trickier than it seems) and could allow ordinary people to understand their council’s spending in all sorts of ways (particularly if we add some of those sexy Where Does My Money Go visualisations). It could also be useful in ways that we can barely imagine – some of the participatory budget experiments going in on in Redbridge and other councils would be even more useful if the context of similar councils spending was added to the mix.
So how would this be funded. Well, the usual route would be for DCLG or perhaps the one of the Local Government Association bodies such as IDeA to scope out a proposal, involving many hours of meetings, reams of paper, and running up thousands of pounds in costs, even before it’s started.
They’d then put the process out to tender, involving many more thousands in admin, and designed to attract those companies who specialise in tendering for public sector work. Each of those would want to ensure they make a profit, and so would work out how they’re going to do it before quoting, running up their own costs, and inflating the final price.
So here’s part two of my plan, instead going down that route, I’d come up with a proposal that would:
- be a fraction of that cost
- be specified on a single sheet of paper
- paid for only if I delivered
Obviously there’s a clear potential conflict of interest here – I sit on the government’s Local Public Data Panel and am pushing strongly for open data, and also stand to benefit (depending on how good I am at getting the information out of those hundreds of spreadsheets, each with multiple worksheets, and matching the classification systems). The solution to that – I think – is to do the whole thing transparently, hence this blog post.
In a sense, what I’m proposing is that I scope out the project, solving those difficult problems of how to do it, with the bonus of instead of delivering a report, I deliver the project.
Is it a good thing to have all this data imported into a database, and shown not just on a website in a way non-accountants can understand, but also available to be combined with other data in mashups and visualisations? Definitely.
Is it a good deal for the taxpayer, and is this open procurement a useful way of doing things? Well you can read the proposal for yourself here, and I’d be really interested in comments both on the proposal and the novel procurement model.