Geolocation Matching Based on Latitudes and Longitudes in R

Today I was faced with an interesting challenge, finding time zones for users defined by latitude and longitude. Thanks to extensive package libraries of R, the solution was not hard at all.

First, I found the list of cities with populations greater than 1000 in web site. Now the list is quite big, almost 150K cities. Doing distance calculations for each of the 48K users that had location information would have been exceedingly long.

Luckily there are some tree based algorithms that make quick work of this kind of problem. I used the RANN package in R to do it. The package is incredibly intuitive to use. Here is the code I used.

# Import library

# Read in geolocation data
geoloc<-read.table("data/cities1000.txt", sep="t", quote="")
# Column names
colnames(geoloc)<-c("id", "name", "asciiname", "latitude", "longitude", "feature_class", "feature_code", "country_code", "cc2", "admin1_code", "admin2_code", "admin3_code", "admin4_code", "pop", "elevation", "dem", "tz", "ModDate")
# Subset usefulpart
# Create a list of locations from our own data to match the geolocation data
test<-unique(raw[,c("latitude", "longitude")])
# Get nearest neigbor
neigbor<-nn2(geoloc1[,1:2], na.exclude(test), k=1, treetype="bd")
test<-cbind(na.exclude(test), geoloc[neigbor$nn.idx,"tz"])
raw1<-merge(raw, test, all.x=T, by=c("latitude", "longitude"))

The code runs incredibly fast, I had timezone data in no time.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at

Up ↑

%d bloggers like this: