Quo Vadis
This year Nominatim celebrates its 11th birthday. On Nov 11th 2009 the OpenStreetMap (OSM) homepage used Nominatim as its main search engine for the first time. Since then OSM has grown enormously and with it the need for a geocoder based on OSM data. The OSM Nominatim servers alone now serve more than 30 million queries per day. In addition, there are many private installations out there.
Nominatim has always been a hobby project, maintained and developed by a few enthusiasts in their spare time. There has been constant work and improvements over the years but there was never really time to tackle the big issues that have been lingering since forever. It is time to move forward and change to a model where we can work on Nominatim full time to really bring it forward. For the moment, this means to collect enough funding for a full-time developer and maintainer role for myself.
We won’t change the licensing model. Nominatim will always remain a 100% open-source community project. We hope that you as a user of Nominatim will be interested in supporting development through one-time donations or regular funding.
To give you an idea of what plans we have, here are some of the major points on the todo list.
Localized data processing
When Nominatim was first developed, OSM was still small and had a limited set of tags with relatively little local differences. So Nominatim went for a relatively simple one-size-fits-all approach for reading the data and forming addresses. The algorithm has the added bonus that it is quite robust in the face of tagging errors and incomplete data. Over the years, the tagging schema in OSM has evolved. It is much more expressive and reflects much better the different realities in the different regions of the planet. Acknowledging these differences can help a lot to get better search results. Nominatim’s latest version 3.5 is already able to process administrative boundaries of each country in a different way. But this is only a first tiny step for a localized data processing. Local tagging practises in OSM are not well documented and often hidden in the minds of mappers and communication of local communities. We need to find a way to make this more explicit and better make use of it for search.
Fuzzy searching
One of the most lamented short-comings of Nominatim is that it is not very tolerant regarding spelling mistakes. It seems to be an easy enough problem to just ignore the wrong letter here or there. Unfortunately it is much harder than that. Place names are notoriously difficult to correct. They come in the strangest dialects, they don’t follow standard pronunciation rules and quite often have the spelling mistake already built in. One wrongly spelled letter can bring you to a place at the other end of the country. To make matters worse, there is the issue of different languages. Nominatim must always be able to search for any place on the planet and therefore find places in any possible language. It must be able to handle translated place names. And when you type in a search query then you may do that in any language as well, usually without explicitly telling which one you use. Matching all this up is not easy.
Currently Nominatim circumvents the language issue by completely ignoring it. All place names are handled exactly in the same way. That works okay as long as you strictly search for the name as it is in OSM. Before we can introduce fuzzy searching, Nominatim needs to learn about the different languages and it also needs to learn what queries in OSM usually look like. This requires some fundamental changes to how Nominatim handles the interpretation of place names and search queries.
Cleaning up the code
Nominatim’s code base consists of an interesting mix of PHP, C, C++, Python and PL/pgSQL. A lot of its logic is still hard coded and hidden somewhere in a function with hundreds of lines of code. This makes it very hard to start contributing or extending Nominatim. To make it easier for developers to get started we plan to further modularize the code and get rid of one or two programming languages in the mix. There are already a number of open issues marked as good issues for new contributors. We’ll grow this list in the coming months.
In the last years, we have spent a lot of time on making Nominatim easier to install
but it still requires quite a bit of knowledge in system administration to get
it running on your own machine. So there is more to be done.
Ideally we get to the point where you can just apt install nominatim
and be ready to start importing your own database.
Keep up with OpenStreetMap
OpenStreetMap has been growing at an astonishing rate in the last couple of years. The planet dump has long gone past 50GB in size and is still growing. Nominatim needs to keep up with this by making sure the data is processed more and more efficiently. The size of a Nominatim database has been around 1 TB for the last 11 years. The software always became that bit more efficient while the size of OSM increased. We intend to keep it below 1 TB for the next 11 years.
This is just a selection of the most important points to keep the post short. We’ll discuss some more details on the issues of geocoding in this blog in the weeks and months to come. If you are interested in sponsoring this work, please have a look at the Funding page.