Permission is granted to reprint/use these on the web so long as there is a link to my website.
Using the Postoffice Address File on a mobile
How to hold 29 million Addresses in an iPhone
This was developed for a client who had licensed the Postoffice Address File (PAF). The PAF is a very large text file generated from the addresses held on their database. There are 2,000-4,000 address changes each month so a new one is issued monthly.
I wasn't sure if we were licensed to use the PAF on a mobile but that wasn't my problem. The project was abandoned for other reasons; a shame as the app was about 97% complete.
I was asked to develop an iPhone App that let you book taxis. To make it work they also wanted individual addresses which is understandable because if you aren't outside the correct house, you won't pick up.
Selecting an address. The app was designed to use Google Maps so it was possible to drag the map. As it dragged, the Google SDK returned the lat/Long of the map centre. I wanted a fast reverse geosearch from Lat/Long to address, so I had to hold the addresses in some searchable form. This also allowed a text search.
Working with the PAF
The version I had at the time had 29,557,600 lines in it with an average length of 64 bytes. It looks a bit like this:
If you scan through it, you'll see that it's a real mish mash. I get the feeling it was thrown together because of the variety of address parts it has.
To get something useful from it, I devised a suite of programs. These diced and sliced and eventually ended up with 2 data files (streets.dat and numbers.dat) plus their index files.
Streets.dat holds approximately 1.7 million points including latitude and longitude, the name, with a postcode and an index to the numbers data file. These are only a segment of the street corresponding to a postcode. It's not unusual to have streets with 5, 6 or more different postcodes. Using a postcode with a street distinguished between street names. Think how many towns have a Church Road, High Street, Main Street etc.
Streets.dat was only 20 Mb in size but numbers.dat was 50 MB. I used every technique known to man to shrink this. The problem was that any street/postcode could have from 1 to many addresses. If it was something simple like 100 houses numbered 1,3,5,7..99 and 2,4,100 then I could encode those as something like 1T99, 2T100.
Approximately half the houses have numbers (850,000) and the rest have names. I tried pattern matching for flats but there were just too many differing numbering systems so in the end, I used a simple LZMA text compression for names as well.
There were around 3 million postcodes in use. Postcodes can be converted to numbers. UK postcodes are a maximum of 8 characters long, counting the space and the last 4 are always space digit char char. E.G. " 4XL".
There are approximately 3,000 postal districts (that's the first part known as the outward code - eg FY5) with the shortest 2 letters eg B1 and the longest 4 letters BT65. The 2nd part (known as the Inward) uses a subset of the chars and does not include C, I, K, M, O and V. You will never see for example a postcode ending in 4CK.
So this means there are a maximum of 4,000 possible inward codes. 10 x 20 x 20. 20 because those 6 letters are never used.
All UK postcodes fit into 12 million unique values as that's 3,000 outward x 4,000 inward and every postcode can be converted to a number in the range 0-11,999,999. A postcode can be stored as a number in just 3 bytes as 224 is 16 million.