Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.
For job seekers, please visit our Career Page or send your resume to hr@actowizsolutions.com
Welcome to the first blog in our series, "Creating an Apartment Rent Price end-to-end app." If you're new to this project, we recommend reading the overview blog first to understand the process better. In this initial step, we'll focus on securing a suitable and relevant dataset, a common initial step in utmost Data Science projects. We'll achieve this by utilizing web scraping to gather the necessary data. If web scraping is unfamiliar to you, don't worry! Skimming through a web scraping tutorial we prepared earlier to familiarize yourself with the concept before diving in. Let's start creating our valuable dataset for the Apartment Rent Pricing App!
Here’s a link given to code available on GitHub:
Getting Started with Web Scraping: Gathering Apartment Listings from https://www.propertypro.ng/
In this initial phase of building our Apartment Rent Pricing App, we will use web scraping to extract valuable apartment listings from the website https://www.propertypro.ng/. We aim to save this data in CSV files for future use. We will focus on the website's layout and structure to extract the required information efficiently. Please look at the website to familiarize yourself with its appearance before we begin the web scrapinga process. Let's prepare to gather the loot and save it for further use in our project!
To gather the relevant apartment data for our project, we will follow a series of steps on the website https://www.propertypro.ng/. First, we'll click on the "Rent" option, then type "Lagos" in the search bar located below the "Rent" section. Afterward, we will click the "Type" dropdown menu and select "Flats and Apartments." Finally, we'll click on the "Search" button. The resulting page will display the filtered apartment listings, serving as our primary data source for the Apartment Rent Pricing App development.
Upon scrolling down, you will encounter a display similar to the following:
As we examine the layout of the website, we can observe that each apartment listing consists of the following essential information:
We aim to retrieve these specific details for each apartment listing on the site.
We will leverage the Python Requests library to retrieve the webpage data and then use BeautifulSoup to parse the HTML and extract the desired information to accomplish this task. Let's start by importing the necessary libraries for our web scraping process.
As we embark on the web scraping process, it's always good to have Numpy and Pandas on hand, as they might prove helpful at any point in our project, even if not immediately.
Now, to begin, let's copy a URL of the present page on a website and pass that into get function of Requests library. Once we have retrieved the contents of the page, we will proceed to parse it using BeautifulSoup:
Printing out soup will allow us to check its content and structure. Here's what we would get:
To inspect the page elements and figure out how to parse the data, follow these steps:
Please return to the website's apartment rents listings page (the one we opened earlier).
Right-click somewhere on a page.
Click on "Inspect" from context menu. It will open a browser's developer tools.
The page should now split into two sections, with the left side displaying the HTML code of the page and the right side highlighting the corresponding elements as you hover over them.
By inspecting the page, you can identify the HTML tags and classes that encapsulate the relevant information, such as the apartment title, address, house perks, description, and details of bedrooms, baths, and toilets. This information will guide us in writing the code to extract the data from each apartment listing on the webpage.
Use the inspect tool effectively, follow these steps:
Following these steps, you can identify the specific HTML elements and classes associated with the data you want to scrape. This information will guide us in creating the necessary code to extract the relevant details from the apartment listings.
Upon clicking on a dropdown icon alongside the selected data in the developer tools, you can explore further and observe the HTML tags that encapsulate each piece of information:
Understanding the HTML structure of the webpage is crucial as it enables us to target the correct elements during the web scraping process. With this knowledge, we can write the code that extracts the required data from each apartment listing on the website.
Let's proceed with building the code to extract all the individual specifications of each apartment listing. We'll use the information we gathered from inspecting the webpage to target the relevant HTML elements. Here's how we can achieve it:
listing_divs = soup.select('div[class=single-room-sale\ listings-property]')
In the provided code, you are using a select function from BeautifulSoup to retrieve all the div elements with a class name with the value "single-room-sale listings-property," you are saving to the list named listing_divs. Let's proceed with checking the number of elements in the listing_divs list using the len function:
print("Number of apartment listings on a page:", len(listing_divs))
Here are the results:
Number of apartment listings on a page: 20
You are correct in your observation. The fact that there are 20 elements in listing_divs suggests that there are 20 apartment listings on the webpage. Each element represents an individual apartment listing along with its details.
Now, let's extract the features we need from the first element in listing_divs:
listing_divs[0]
This provides us the given output:
Certainly! By examining the HTML structure and identifying the relevant tags, we can now begin extracting the features we need from each apartment listing. Let's extract them one by one:
listing_divs[0].select('h4')[0].text
Here's the code snippet to retrieve the address, which is enclosed in an h4 tag, from the first element in the listing_divs list:
The code above will output the address of the first apartment listing in the listing_divs list. You can repeat this process to extract other features such as price, description, and details of bedrooms, baths, and toilets from the same listing.
'Ikota Lekki Lagos'
That’s very easy! After that, let’s try and scrape the pricing tag:
listing_divs[0].select('h3[class*=listings-price]')[0].text.strip()
Here's the code snippet to retrieve the price, which is enclosed in an h3 tag with a class containing the name "listings-price." We use the strip() function to remove any leading and trailing whitespace around the extracted value:
'N 2,800,000'
Next is total bedrooms, toilets & bathrooms:
listing_divs[0].select('div[class*=fur-areea]')
[0].text.strip().split('\n')
Great! Let's proceed with the code snippet to retrieve all three features (house perks, bedroom details, bath details, and toilet details) from the div tag with a class name containing "fur-areea." We'll use the text function to extract the entire content of the div as a single string. Then, we'll use strip() to remove leading and trailing whitespace, and finally, we'll split the string using newline escape characters to separate the individual features:
['3 beds', '3 baths', '4 Toilets']
The last characteristic to scrape is description data.
Absolutely! The "Serviced" theme enclosed in purple and the line directly under it can provide valuable additional information about the apartment that might not be evident from the price and number of rooms alone. Let's proceed with retrieving the "Serviced" theme from the first apartment listing:
listing_divs[7].select('div[class*=furnished-btn]')
[0].text.replace('\n', ' ').strip()
Let's retrieve the "Serviced" theme from the div tag with a class name containing "furnished-btn" as you mentioned. We'll replace newline escape characters with a single space and then strip off any leading and trailing whitespace:
'Serviced Newly Built'
Let's proceed with cleaning up the line directly under the "Serviced" and "Newly Built" themes. We'll remove any leading and trailing whitespace to get a cleaner version of the information:
listing_divs[7].select('div[class*=result-list-details]')
[0].p.text.replace('Read more', '').replace('FOR RENT:', '').strip()
Let's proceed with extracting the information from the div tag with a class name containing "result-list-details." We will then retrieve the string from the p tag within the div and clean it up by removing "Read more" and "FOR RENT:" using empty strings. Finally, we'll strip off any extra whitespace:
3 bedroom Flat/Apartment for rent Old Ikoyi Ikoyi Lagos...
Let's put all the code together to extract all the necessary information from each apartment listing on the webpage. We'll test it further to ensure it works as expected:
Check the code given below:
Great explanation! Now let's put everything together and walk through the code step by step:
The code above will output a DataFrame containing the extracted information for all 20 apartment listings on the webpage. The DataFrame will have columns for Address, Price, Rooms (beds, baths, and toilets), and Description & Extra details. Now you can further analyze or manipulate the data as needed using Pandas!
Fantastic! Let's wrap up the task by creating a dynamic function that retrieves apartment listings based on the city's name. The function will convert the data into a Pandas DataFrame and save it as a CSV file locally. Here's the dynamic function:
Sure, here's a high-level overview of the function parse_listing_data without any code:
The parse_listing_data function is designed to scrape apartment listings data from a web page based on the provided location (city). It allows for an optional parameter max_price, which acts as a filter to exclude overpriced apartments. The function also accepts the number of apartment listings num_listings that the user wants to retrieve.
Here's a summary of the function's workflow:
Initialize an empty list called all_listings_data and set page_num to 0.
Enter a while loop to iterate until the number of pages scraped reaches the desired num_listings. If the num_listings is 200, it will scrape 10 pages (assuming each page contains 20 apartment listings).
Build the URL for each page by combining the base URL with the location, max_price, and page_num using string concatenation.
Use the requests.get function to retrieve the HTML data from the URL and parse it using BeautifulSoup.
Select all the apartment listings on the page. The loop terminates if there are no listings (length of listing_divs is 0).
Loop through each listing and extract its address, price, number of bedrooms, bathrooms, toilets, and description.
Append the extracted data for each listing to the all_listings_data list.
Increment page_num by 1 to move to the next page in the loop.
After scraping the desired number of pages, convert all_listings_data into a Pandas DataFrame with appropriate column names.
Save the DataFrame as a CSV file, with the filename suffixed by the provided location name.
Return the DataFrame as the output of the function.
Following this process, the function can dynamically scrape apartment listings data for the specified location, filter out high-priced apartments if required, and store the data in a Pandas DataFrame and CSV file for further analysis or usage.
Here's the high-level overview of how to test the function for the city of Lagos without any code:
Define the necessary arguments: location (set to "Lagos"), max_price (optional, set to a specific maximum price to filter out overpriced apartments), and num_listings (the number of apartment listings you want to retrieve).
Call the parse_listing_data function, passing the above-defined arguments for Lagos.
The function will scrape apartment listings data for Lagos based on the specified parameters (max_price and num_listings).
The function will return a Pandas DataFrame containing the scraped data.
You can display the DataFrame to see the retrieved apartment listings data for Lagos.
By following these steps, you can easily test the function for different cities by providing the appropriate location name and other relevant arguments.
lagos_data = parse_listing_data('lagos', 2500000, 5000)
lagos_data
Here, we pass in lagos as our preferred location, 2.5 million naira as our maximum price, and 5000 as the number of rows we want. Here’s our result:
Here's the high-level overview of how to test the function for the city of Ibadan without any code:
Define the necessary arguments: location (set to "Ibadan"), max_price (optional, set to a specific maximum price to filter out overpriced apartments), and num_listings (the number of apartment listings you want to retrieve).
Call the parse_listing_data function, passing the above-defined arguments for Ibadan.
The function will scrape apartment listings data for Ibadan based on the specified parameters (max_price and num_listings).
The function will return a Pandas DataFrame containing the scraped data.
You can display the DataFrame to see the retrieved apartment listings data for Ibadan.
By following these steps, you can easily test the function for different cities by providing the appropriate location name and other relevant arguments. The function will automatically save the CSV file with the data locally, as specified in the function code.
ibadan_data = parse_listing_data('oyo', 3000000, 2000)
ibadan_data
Let's test the function for the city of Oyo with a maximum apartment price of 3 million Naira and 2000 rows of data:
Assuming the function is defined as mentioned before, we can use it to retrieve apartment listings data for Oyo as follows:
Let's test the function for the cities of Abuja, Ogun, and Port Harcourt:
Assuming the function is defined as mentioned before, we can use it to retrieve apartment listings data for these cities as follows:
Fantastic! You have successfully tested the function for different cities, and here are the results:
Congratulations on completing the task! You have built a dynamic web scraping function that retrieves apartment listings data for various cities, filters based on price if needed, and stores the data in a Pandas DataFrame and CSV file. Cheers to your accomplishment! If you have any more questions or need further assistance, feel free to ask. Well done!
In this comprehensive tutorial, we walked through the step-by-step process of building apartment listings datasets by web scraping data from Nigeria's top real estate pricings website. The code for this tutorial is available on GitHub for easy reference.
The next crucial phase is Data Wrangling, where we will focus on cleaning and preparing the scraped data into a format suitable for further analysis. This step typically consumes a significant portion of our development time, accounting for over 40% of the total effort.
Thank you for following along! If you have any questions or need assistance, feel free to contact Actowiz Solutions. We offer services for mobile app scraping, instant data scraper and web scraping service to cater to your specific requirements.
Until next time, happy coding and data wrangling! Goodbye, folks!
Web Scraping for FMCG Price Tracking offers real-time data, competitive insights, and pricing trends, helping businesses optimize strategies and boost profits.
Discover how AI, ML, and Web Scraping optimize grocery categorization with image recognition, NLP, and predictive analytics with Actowiz Solutions.
Actowiz Solutions' report unveils 2024 Black Friday grocery discounts, highlighting key pricing trends and insights to help businesses & shoppers save smarter.
This report explores women's fashion trends and pricing strategies in luxury clothing by analyzing data extracted from Gucci's website.
Discover how Actowiz Solutions' AI-Powered Web Scraping optimized a streaming platform’s content strategy through advanced Social Media Sentiment Analysis.
Discover how Actowiz Solutions leverages AI-driven web scraping to transform real estate market predictions. Gain insights into pricing trends and smarter investments.
Discover how LLMs compare to web scraping in data extraction. Explore their potential, limitations, and impact on the future of data collection.
Actowiz Solutions empowers businesses by scraping travel price data, enabling accurate comparisons to help users discover the best deals effortlessly.