aosm() – Acquiring, Filtering and Analysing OpenStreetMap Data

27-Dec-2017: Updated the script and examples to work with new mapzen metro extracts.

Following up the last post, where I have outlined my overall understanding and plan for the intended function, today I have finished the first working code of the aosm( ) function and as promised in the last post, I am posting the code along with the explanations and instructions. Before going into the function I would like to stress the importance of getting the system setup  for running the function. I think getting all the software installations right, with the system environment variable updated with their executable files, is the biggest problem I faced while trying to get the function running on other systems. So please read the system requirements section carefully before trying to run the script.


instructions  (demo section is outdated, see end of this post for working examples)

System Requirements

1. Windows Operating system (windows 7 preferably) with administrator rights since installation of the software programs listed below is required.

2. R 2.15.2 ( installed and if possible R studio ( which provides a better user interface (R studio needs an R installation to work.). Since there is nothing else in the script file, the function can be loaded to R workspace directly by running source(“c:\\%location%\\aosm.r”) command. [update(27 Mar ’13): the R session must be run in administrator mode for the system() commands to work in windows 8.]

3. Osmosis ( installed in the system, with ‘osmosis.bat’ (which is inside the bin folder) file location added to the system path variable. This is really important, the function relies heavily on osmosis without which it’ll never work. Osmosis is based on java, so make sure you have java installed and system path variable is updated with java location as well. To check if everything is OK, open command prompt and type run the commands “java” and “osmosis” to see if they are recognized.

4. 7zip ( installed in the system with ‘7z.exe’ file location added to the system path variable. This is also equally important if you don’t have the OSM data locally in .osm format. After using the installer to install the program manually add the 7-zip folder to the system path. Again to check, run “7z” in the command prompt and see if it is recognized.

5. Since there is a lot of data which needs to be downloaded and extracted it is recommended to have atleast 2 GB of free space in the hard drive. The OSM data for London is around 150MB when zipped and it is almost 1.5 GB when extracted, so make sure you don’t run out of space. If you have multiple drives, please set your working directory to a drive with minimum 2GB space before running the function (the function does not changes the working directory at all, so all the downloads/ temporary files are kept in the current working directory)

6. Finally most important one is internet connection. The function was envisioned as a way to get data directly from the internet and the option to read local data is built to optimize the running time in consecutive runs. So the function makes a lot of references to internet and strictly requires internet connectivity to work. (I know this is absurd and am trying to build a work around. I realized this being a real problem when internet outage lasting for half a day in the UCL Halls left me paralyzed when developing the function)

The Function


name.object <- aosm (“world”, ”geo-filter”, ”tag-filter”, ”analysis”, ”type “)


“world” – A string, which is the name of the city for which the data to be developed. The string has to be in lowercases and if the city name has spaces in it, then it has to be replaced with “-“. (E.g. “london”, “san-francisco”). The script checks for local file availability in the current working directory and if it cannot find one, then it downloads it from the OSM extracts. There are three formats to supply this data locally: ‘.osm’ file, ‘.osm.pbf’ file or ‘’ archive.

“geo-filter” – A string, which is the name of the area within the city for which the data has to be extracted.The string has to be in lowercases and if the name has spaces in it, then it has to be replaced with “-“. (E.g. “Islington”, “city-of-london”). It denotes the name of a .poly file which can be supplied locally or can be downloaded by the function from the internet. as of now, I have boundaries of all the boroughs in London hosted in the server ( and would be updating it as I get more time. The other way to supply the boundary file is to have a shape file in the current working directory named “boundary.shp” with the polygon you want to use with the name of the polygon in the attribute table under the header “NAME”. By default any shape file in the current directory with the name – “boundary.shp” will be converted into individual polygon files with the string in the “NAME” column as file names. for example, if you keep a london borough boundary shape file (with the name “boundary.shp”) in the working directory, the function will extract all the boroughs as .poly files.

“tag-filter” – A string, denoting filter definition. The syntax is “switch_name”. Where, “switch” is either “d” or “t” denoting if the name is a definition file or the tag filter in itself. OSM has a really straight forward way of tagging its features which is every tag has Key and Value. For example a way can have a “building” tag and a value of “yes” which marks it as a building and a way can have a key of “highway” and the value of “residential” which makes it a residential street. So there is two ways of building a tag based filter one is by just writing  the key and values directly in the function (t_highway,residential or t_building,yes) or for more complex filters, by making a definition file and keep it a text file in the working directory and pointing it in the function (for example d_landuse, where the function will search for a file named “landuse.txt” for the definitions). There are two sample definition files I have hosted in the server which the function can download ( If there is only one value in the tag-filter definition, then all the features with the corresponding key are extracted regard less of the values. One can see what are all the keys and values used in OSM by the volunteers in the wiki page or taginfo, which will give an idea of how things are organised in OSM.

“analysis” – A string, which defines the type of analysis to be done on the data extracted and the type of result expected from the function. Currently supports the following the values of  “default” (for an sp object), “utm” (for an sp object with CRS), “cn” (for count of features), “ar” (for sum of all the areas of the features) and “len” (for sum of all lengths/perimeters of the features). A detailed explanation can be found in the outputs section.

“type” – A string with one of the these three values – “points”, “lines” or “polygons”. This is determine the type of sp object which is returned by the function.


The output from the function differs significantly based on the “analysis” string in the input. The possible strings and corresponding analysis are given below.

“default” – returns an sp class object (SpatialPointsDataFrame, SpatialLinesDataFrame, SpatialPolygonsDataFrame) with the added attributes showing the key-value tags and the name tags without any CRS information

“utm” – returns a similar object to above but with the CRS information using Universal Transverse Mercator and WGS84

“cn” – returns the count of features

“area” – returns the sum of areas of all the features in the resulted data in square meters

“len” – returns the length/perimeter of all the features in the resulted data in meters.


The function aosm( ) takes 5 inputs and applies 16 sub functions on the inputs to generate the results. Since I have attached the source code of the script and this blog is getting really long, I would like keep the explanation brief.

The function first sets the environment by installing all the required packages. It then checks the WD for boundary shape file and if found, converts it to .poly files. It then evaluates the inputs to see where and in what formats do the required data exists and creates a data frame explaining the situation. This involves checking for locally available data and data available in the internet sources for all compatible formats. The next step checks the situation and evaluates if the function can continue. If it finds any errors or missing information, it reports the error and shuts down the function before any intensive task is started. Once the validity of the inputs are confirmed, the function then arranges the data from the available formats, downloads it and converts it to the desired format. Here local data is given preference over data on the internet. Once the data is arranged, the function invokes osmosis for the filtering process and makes a system() call based on the inputs. Once the filtering process is complete, osmar is used to import the filtered data file to a sp object and the extra attribute information is attached to it. The resulted object is then projected using UTM projection and WGS84 datum. As the final step, based on the inputs, the function applies appropriate analysis on the sp object and returns the results.

If you are really interested in how the script works, then the following chart explains how all the sub functions are tied together and process the inputs. I would also recommend consulting the instructions file referred in the start which explains all the functions as well.

Complete Flow Diagram of the script


map<-aosm(“london_england”,”city-of-london”,”t_building”,”default”,”polygon”) +plot(map,col=”#B43104″,bor=“#B43104”)


map<-aosm(“london_england”,”city-of-london”,”t_building”,”utm”,”polygon”) +plot(map,col=”#B43104″,bor=“#B43104”)

will return the following plot,


will return the value 1364


will return the value  1312264.90810641


will return the value 337414.610756376

So concluding this extremely long and drab post, I would request the readers to give it a try and share the results & problems in the comments section below. Also feel free to put in your suggestions, point out any mistakes and tell me about any other existing solutions which may serve same purpose.

5 thoughts on “aosm() – Acquiring, Filtering and Analysing OpenStreetMap Data”

  1. Hello,
    I’d like to thank you for sharing this function and to report an error I’m getting while trying to run your example.
    I’m using R 3.0.2 on windows 8, with java, osmosis and 7zip correctly working, and I get this error:

    > map<-aosm("london","city-of-london","t_building","default","polygon")
    Error in file(con, "r") : cannot open the connection
    In addition: Warning message:
    In file(con, "r") :
    cannot open file 'london_city-of-london_t_building.osm': No such file or directory

  2. Hi Roberto,

    Happy to see that you find the function useful.

    Regarding the error, The file ‘london_city-of-london_t_building.osm’ is created as a result of the filtering process done by osmosis, so I guess osmosis is not working correctly.

    1) check if the london.osm.bz2 is downloaded successfully in your working directory.
    2) check if the london.osm is created successfully in the working directory.

    if both steps are successful, then you may use the windows command line to navigate to the working directory and run the osmosis command manually ( here ) to check if osmosis is working properly.

    if it is not working, get a screenshot of the error message and send it to me. I’ll check and let you know.

  3. Thanks.

    I had those files in my working directory, but their dimension was less than 2MB, I deleted them and tried the function again and it worked. I guess something went wrong with my connection when I ran the function the first time, and those partial files generated the error when i tried it again.

    I also tested it on a Mac OS X with the 7z command installed with Homebrew by the command “brew install p7zip” and it worked.

  4. Perfect! All the best.

    One of the thing which is a mild irritant with the function is that it gives preference to local data over data from internet. Though it improves performance in the consequent runs, it runs into the above mentioned problem when you have a bad file in the first run. I think I should add a switch to make the function to start with a clean slate. Will do it and update when I get some time.

Leave a Reply