In Part 1, collection of Plat IDs was covered using Selenium by executing javascript. In Part 2, data mining a table using Selenium and BeautifulSoup was detailed and a full dataset of GIS data was generated. In Part 3, I will finish this series by showing how to plot this data on a map using folium.
Table of Contents
Latitude & Longitude
Some GIS datasets provided by counties already contain latitude and longitude values, however, in this dataset I had to generate those values from an address. Why did I have to generate them? I’ll get to that in a bit, suffice it to say addresses are not sufficient for plotting and they must be converted into latitude and longitude. To do this, there are several options, some requiring payment such as Google’s, but I chose a less accurate, yet free method, Nominatim. Import the library as well as its ratelimiter which will inhibit hammering the API with too many requests.
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
The addresses are read in as a dataframe as always, and Nominatim and RateLimiter are setup to enable retries as well as delays between each query.
loc = Nominatim(user_agent="GetLoc")
gc = RateLimiter(loc.geocode, min_delay_seconds=2, max_retries=5)
Finally, we call Nominatim with the property address and get latitude and longitude in response.
for index, row in df.iterrows():
if row is not None:
curloc = gc(row['Property_Address'])
if curloc is not None:
df['Latitude'][index] = curloc.latitude
df['Longitude'][index] = curloc.longitude
print(str(index))
df.to_csv('latlong.csv')
This process took quite a while as I had tens of thousands of addresses to convert, so I ran this overnight. Another thing worth noting is that I ran into timeouts without rate limiting, and I also found that not all addresses converted, so I ended up with a subset of data. Because I had such a huge dataset, I was willing to deal with this and move forward, however, it may be an issue if a complete dataset is desired.
Folium
Now on to the highlight of this series, plotting the data on a map. The tool I chose was folium which allows for python plotting of heatmaps, markers, and other data on to various types of maps. Although I did experiment with heatmaps, the final plot utilized circlemarkers. Importing folium and its elements is accomplished as follows.
import folium
from folium.plugins import HeatMap
from folium import CircleMarker
The first step in creating a folium map is to pick a starting point and zoom level. With a little experimenting, I ended up with this.
map = folium.Map(location=[44.86023251970677, -
93.73302265124681], zoom_start=12)
The resulting map, when generated, looks something like below. There are quite a few different styles of maps to choose from, but for the purposes of this project, I stuck with the default.
Http iframes are not shown in https pages in many major browsers. Please read this post for details.Circle Marker Details
There were two things I augmented for circle markers, one being the color, and the second being the pop up text. The marker data is sale price, so the overall map will show house prices at their subsequent latitude and longitude. Secondly, I augmented the color of the circle to give a gradient on house prices instead of simply relying on the diameter of the circle alone.
for index, row in df.iterrows():
popup_text = "Price: {}<br> Latitude: {}<br> Longitude: {}"
popup_text = popup_text.format(row['Sale_Price'],
row['Latitude'],
row['Longitude'])
price = row['Sale_Price']
if price < 100000:
color = "#00FFFF"
elif price >= 100000 and price < 300000:
color = "#85CB33"
elif price >= 300000 and price < 1000000:
color = "#F9B700"
elif price >= 1000000 and price < 5000000:
color = "#FF33FF"
else:
color = "#FF0000"
Result
Generating the HTML for the Circle Marker plot is now quite simple. I set the radius to the price of the property divided by 1 million, enable the popup text, and set the color of the circle.
CircleMarker([row['Latitude'], row['Longitude']], radius=(
price/1000000), fill=True, color=color, popup=popup_text).add_to(map)
map.save('map.html')
Map
Be patient as it loads…
Http iframes are not shown in https pages in many major browsers. Please read this post for details.As you can see, the map is interactive and gives the viewer a quick feel for the price of properties in a very visually appealing way. The gradient from the lowest to highest price is very huge, so it was difficult to figure out exactly how to scale the circle radius and colors. Clicking on the center of a circle results in showing the latitude, longitude, and actual dollar value of the property.
Conclusion
This series gave me a solid introduction to pandas, Selenium, BeautifulSoup, and folium. Their flexibility and parameterization combined with their ease of use enables quick and easy data manipulation, mining, and visualization. Feel free to drop me a comment if you have improvements or alternatives to the approaches I took. Thanks for reading!