Writing a Web Scraper to Search for Apartments

Ben Leamon
6 min readJan 10, 2021

My friend was looking to move into another unit in his UR apartment. The previous owner had vacated the premises and the housing company was doing a deep-clean. Management, however, stubbornly insisted they had no information about he unit, and all he could do was keep checking the website in case the unit should become available. Apartments in his building go fast, and leases are awarded first-come-first serve, so applying in a timely manner is of the essence. If only we could automate this process…

Part I: The web scraper

Writing the web scraper itself wasn’t too much work. The script loads the page for the apartment, then checks for a message saying there are no rooms available. If so, the program logs that it ran in a CSV, and closes.

If there are rooms, it will close any pop-ups helpfully suggesting that we get our application in fast, then find and record the text in all the elements displaying room numbers. Available room listings can span multiple pages, so the program will next check if there is a next page button, and if so click it and repeat the process. This goes on until there is no next page button, and consequently no more pages to cycle through.

Part II: Emailing the results

After the scraping finishes, the program closes the web page, and extracts the actual room number from the full apartment listing. For example, “1号棟638号室” becomes “638”. The program then compares the list of available rooms to a list of rooms my friend is interested in. The program logs the time it ran and its full findings in a CSV file, then emails both me and my friend any hits.

Part III: Automating the script

None of this would be any good unless it ran automatically. I had a Raspberry Pi lying around from last Christmas, and thought that this would be a perfect time to put it to use. Getting things up and running on my Raspberry Pi took a lot longer than I anticipated, and actually took longer than writing the script itself. Troubleshooting the web scraper was a lot of fun. Trying to sort out why the python script didn’t work on the Raspberry Pi, not so much. Here are some of the things that took up more time than they should have: (Note: If you are used to coding on a Raspberry Pi, this section will likely be of very little value to you, but if it’s your first time, I hope you’ll find this useful.)

  • Connect Via VNC: Rather than trying to set up two keyboards and sacrifice your external monitor, it’s much easier to connect to the Pi via virtual network connection using VNC Viewer.
  • Google does not release an official ARM chrome driver to use with Selenium. You can however, get it by running sudo apt-get install chromium-chromedriver on your pi.
  • The Thonny IDE, the command line, and cron seem to use different PATHs (I’m about 50% sure this is actually correct). That’s to say that any files linked to in the script need to be linked to correctly. See code below.
  • Cron runs tasks in the background, which is not helpful in this case since we need it to open a new browser window. See the final Cron entry below.

Code:

As is about to become abundantly obvious, I am not the world’s best Python coder. Especially I’d like to draw attention to the waits in the Selenium code and the the global variables I use to save the rooms. I tried using implicit waits — waiting until certain elements had been loaded on the webpage and then progressing — but I couldn’t get them to work as hoped. As for the variables, again I’m sure a little research wold reveal a more elegant solution, but ultimately this project was time sensitive and I wanted to get it up ASAP.

Config file:

[EMAIL]
user = user name you'll be sending from
password = password for that email
recipient = recipient1@gmail.com, recipient2@gmail.com
[APARTMENT]
url = url for the UR building you're interested in
interested = 410, 808, 111, 532, 4
[DRIVER]
# Note: I needed this on my mac, but not on the Pi.
path = path to driver
[MANAGEMENT]
logpath = path to your log.csv
logemail = email to send the log to

Scraper:

# Import packages 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
import time
import datetime
import smtplib
import configparser
import csv
import os
# get properties from my config file.
config = configparser.ConfigParser()
# Note: we need to link to the config file like this, otherwise it won't load when executed by Cron.
config.read(os.path.join(os.path.dirname(__file__), 'properties.ini'))
# List of rooms we want to watch out for
interestedRooms = config['APARTMENT']['interested'].replace(' ', '').split(',')
# log file
log = config['MANAGEMENT']['logpath']
# Gmail credentials
gmailUser = config['EMAIL']['user']
gmailPassword = config['EMAIL']['password']
to = config['EMAIL']['recipient'].replace(' ', '').split(',')
url = config['APARTMENT']['url']# set up web driver
path = config['DRIVER']['path']
driver = webdriver.Chrome()
# Maximize window
driver.maximize_window()
# Flag to stop scraping
endScraping = False
# List of all available rooms in the scraped-building
allRooms = []
# List of just the room numbers for the available rooms
parsedRooms =[]
# List of rooms that are available and we are interested in
applyRooms = []
# email properties
sentFrom = gmailUser
emailText = f"""\\
Subject: Hey there!

The following appartments are available. {applyRooms}
"""
def save_rooms():
'''Takes a page of UR room listings, extracts the room numbers, and adds them to the list allRooms'''
availableRooms = driver.find_elements_by_class_name("rep_room-name")
for i in range(len(availableRooms)):
allRooms.append(availableRooms[i].text)
def next_page():
'''Checks to see if there is a next page button in the document, and if so clicks it.'''
try:
driver.find_element_by_partial_link_text('次へ')
nextButton = driver.find_element_by_partial_link_text('次へ')
nextButton.click()
time.sleep(10)
except NoSuchElementException:
global endScraping
endScraping = True

def extract_rooms():
'''Takes the string listing buiding and room number and extracts just the room number'''
global parsedRooms
for i in range(len(allRooms)):
parsedRooms.append(allRooms[i].split('号棟')[1].rstrip('号室'))

def check_rooms():
'''Check the room numbers we're interested in against the list of available rooms'''
for i in range(len(interestedRooms)):
if str(interestedRooms[i]) in parsedRooms:
global applyRooms
applyRooms.append(interestedRooms[i])
def email():
'''Send an email with a list of rooms available for rent'''
# Content of the email
emailText = f"""\\
Subject: Hey there!

The following appartments are available. {applyRooms}
"""
for i in range(len(to)):
try:
# Sent the email
server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
server.ehlo()
server.login(gmailUser, gmailPassword)
server.sendmail(sentFrom, to[i], emailText)
server.close()
except Exception as e:
print(e)
print('something went wrong.')
def add_log(lpath, message):
currentTime = datetime.datetime.now()
with open(lpath, mode = 'a') as log:
log_writer = csv.writer(log, delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)
log_writer.writerow([currentTime, allRooms, applyRooms, message])
def kill_popup():
'''Cheks to see if there is a popup on screen and clicks the close button'''
try:
driver.find_element_by_class_name("item_button_close")
popupClose = driver.find_element_by_class_name("item_button_close")
popupClose.click()
time.sleep(2)
except:
return
# Open a window, go to the URL
driver.get(url)
#Wait for the page to load (could probable make this more sophisticated)
time.sleep(20)
# Find the notification that says there are no available rooms in a building
noRooms = driver.find_element_by_css_selector('section.article_vacancy > div > p').text
# If there are no rooms, close the driver, end.
if noRooms[0:24] == "現在、当サイトからご案内できる空室はございません":
print("No Rooms")
add_log(log, "No Rooms")
driver.close()
else:
# Iterate through the pages untill you reach the last page. For each page save rooms.
while endScraping == False:
kill_popup()
save_rooms()
next_page()
else:
# Now that scraping is finished, extract room numbers, check against your list of interested rooms, send the email
print('scraping finished...')
extract_rooms()
check_rooms()
add_log(log, "Rooms detected")
if applyRooms:
email()

# Close the driver
driver.close()

The Cron task:

DISPLAY=:0.0
0 * * * * cd /home/pi/Documents/ur-scraper && python3 scraper-1.py

--

--