Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.
For job seekers, please visit our Career Page or send your resume to hr@actowizsolutions.com
In this blog, we will use Python, Twilio, and Heroku to extract data from a grocery website API and find a text notification while slots are available
We live in extraordinary times.
And with extraordinary times come different challenges. One such challenge was preserving grocery supply chains with millions of people under lockdown due to Covid-19. For vulnerable people who are isolated or unable to go to the supermarket physically, the only accessible option is booking a supermarket delivery time slot online. Though, with a massive demand for these services, it has become disreputably challenging to get an accessible slot- leaving many people nonstop logging in to check the slots.
That got us thinking about- the ever-increasing number of problems we face and how we could utilize Python to automate this procedure for me.
The initial step towards our objective of some 'automated time delivery slot checker' is finding how we could programmatically scrape data that we need from a grocery website.
After choosing ASDA as our grocery site, making an account, as well as inputting the delivery postcode, we arrive on a delivery slot page, given below.
Here we could see a precisely made table of times, dates, and accessibility of every slot. Naturally, all the slots are presently showing 'Sold Out.' However, we have prominence on the targeted data we need to get with the tool.
If you've done any data scraping before or used with web development, you'll get well-versed with in-built DevTools functionality to most important browsers. For those who are not, there is a set of tools that permit users to examine the webpage and study the CSS, HTML, JavaScript, and critically for project- metadata associated to network requests getting made to as well as from the server and webpages. The following step is perhaps the most important one.
With DevTools windows visible, we could start to see what's happening behind the scenes in the webpages to allow us to observe an updated table to do slot availability. Navigating a 'Network' tab of the DevTools window, we get access to all network requests made by the website to find the newest data displayed. Refreshing a webpage will produce a listing of requests, one of which must have the key to seeing where the slot accessibility data is coming from.
This listing may look a bit confusing because we would have a sea of various requests, collecting everything from CSS describing webpage formatting to JavaScript determining a website functionality. We are involved in collecting data to present on a webpage. So, filtering requests for those of kind 'XHR' (XMLHttpRequest) helps us to concentrate only on requesting data from a server, ignoring that focused around a webpage style. It still leaves few requests to get inspected; luckily, gambling that the required requests will have a word 'slot' narrows the search to four outstanding requests.
Click on the request and select a 'Response' tab that discloses a JSON response produced by request and, therefore, the data provided to a webpage. From that, we could very quickly observe that a request having data we are searching for is the POST request for the URL https://groceries.asda.com/api/v3/slot/view. Just look at the 'Params' tab; we can see JSON data provided by a browser in a POST request as well; as right-click and select 'Copy All' to copy JSON data into the clipboard means that we get everything that we have to describe to Python about how to collect data.
A Requests library of Python makes that very easy to make HTTP requests programmatically. From the given inspection of a website, we know the URL we want to send the request of, the kind of requests we want to utilize (POST), and the JSON data needed to send (presently stored in the clipboard).
Practically, it gives us a code shown below:
We have pasted JSON data from the clipboard and added an easy request, posting data to the URL with json argument of a request.post() technique. Our request responding object is stored in variable r to use later.
We have replaced some parameters also in data having variables. The start_date and end_date variables clear the dynamic range to an API because we are always interested in looking two weeks ahead of the current date. The strftime technique of datetime objects helps us stipulate the precise string format needed for date-time objects, which we could match with a format we reviewed in the early JSON data copy.
The stored parameters like os.environ variables are essential information that we don't wish to get publicly available on GitHub. Afterward, we would see how we could safely store these data to be shown in the scripts.
We now get a completely working Python script that we can utilize to send requests to Asda's API and store a response object we get. Let's observe the response object and discover how we could parse that to scrape the data we're searching for.
Our responsive object r has all data or metadata got back from the POST request with Asda's API. We first need to check if our request for the server was successful or if everything went wrong. To do that, we can examine a status_code attribute for the response objects.
Here, we have to double-check that the URL and data are correctly formatted. If it doesn't return 200, a request has gone wrong. The complete listing of possible HTTP codes could be available here, but generally, we will get the 200 code suggesting 'OK' and the 400/404 code to make 'Bad Request' and 'Not Found' correspondingly.
Presuming that we have the 200 status code, we are ready to review the data we have in response. As it is a standard view to get data in JSON format, requests come with the in-built JSON decoder.
Printing values of r.json to a terminal would quickly disclose that we have got big data back from a server associated with slot accessibility, pricing, capacity, etc. As we are mainly interested in slot accessibility for the project objective, we could loop through that JSON response and fill the dictionary with slots and accessibilities.
We initially loop through every slot day within two weeks that we have looked for, and within every day to study every individual slot, filling the dictionary :
Now as we get all data needed, and the way of programmatically extracting it when we wish, let's assume how we could set up a way to inform our end-users when the delivery slot gets available.
Twilio is a cloud communications platform providing APIs that allow developers to send and receive text messages and phone calls in projects and apps. It opens up the entire world of possibilities for auto SMS notifications, two-factor authentication, creating chatbots, etc. Here, we will make an easy text notification system, like we get the text details of any accessible delivery slots whenever the script runs.
Though Twilio is a paid service, they provide a free trial of about £13. To start with Twilio, we have to sign up on the website (no payment data needed) and select a phone number. When it is completed, Twilio will offer us the account SID with authentication token for a project. It is more than sufficient to find us started with the project- given it costs unevenly £0.08 to send the text.
When we all are set with the Twilio account, we could start using Python API provided by Twilio. A Twilio module done for Python could get installed just using pip.
A Twilio API used for Python is straightforward to start with, and so many documents are accessible at https://www.twilio.com/docs. For sending a text within our newly developed phone number, we need the following:
Including this in our script for getting accessible delivery slots, we could check data for accessible slots and, if they exist, send the text to phone numbers of our selection with the notification of our preference. It is outlined in the last segment of the script:
We get a complete script, allowing us to observe for accessible delivery slots with Asda and, if they are available, get a notification through text to inform us. The only enduring step in the project is to get a way of having a script running on its own as per the schedule.
Heroku is a cloud-computing platform allowing developers to deploy projects and apps to the cloud. It's beneficial to run web apps with the negligible set-up: making that perfect for individual projects. Here we would utilize Heroku as an easy way to get our script running at planned intervals.
You could sign up to start with Heroku here.
The initial step we have to take is creating a new app for housing our project:
To get the script up and running on the cloud, we have to create a new GitHub repository with our script. You can find ours here for your reference. We also have to make a file called requirements.txt. It will have all the package dependencies needed to tell Heroku to install before it can successfully run the script.
Then, we can connect the app with the GitHub repository created for this project. Allowing 'automatic deploys' suggests that while pushing to the main branch, the project would automatically deploy with the newest updates: which is helpful if we wish to continue the project's development while it is in production.
As mentioned earlier, several variables are in the script we wish to keep a secret. We could do that using 'Config vars' to set the Heroku app, an effortless way of storing sensitive data in the project that could easily get accessed like environment variables:
The last step is getting our script to work automatically on the schedule. To do that, we will have to install the add-on to the app. You can install the Heroku scheduler, which helps us run jobs every 10 minutes, hours, or days.
When we install the Heroku scheduler, we can create a new job that will permit us to select our scheduled frequency and the command we would love to run. As slots go very quickly, 10 minutes is the best for scheduled jobs. The run command is easy to run the Python script:
Now, we can sit and relax as well as wait for text notifications!
We have developed many skills with this project which has opened up the world with many possibilities for new projects:
Now, we can inspect a site with DevTools, reverse engineer an API, and utilize Python's request library to scrape data: it gives us the required skills to scrape data from nearly all publicly available websites.
We have a setup using Twilio, a communications API that helps us make calls and send texts. It provides an easy method of getting or sending notifications using the reader and also opens more possibilities for Twilio: alert systems, chatbots, robo-callers, and more.
We have deployed this project using Heroku, permitting scripts to run autonomously on the schedule on the cloud. An excellent skill to get, removing local dependencies of running scripts on the PC or laptop and providing a fantastic opportunity to showcase projects online. Thanks a lot for reading this blog!
To know more, contact Actowiz Solutions! You can also reach us for all your mobile app and web scraping service requirements.
Web Scraping for FMCG Price Tracking offers real-time data, competitive insights, and pricing trends, helping businesses optimize strategies and boost profits.
Discover how AI, ML, and Web Scraping optimize grocery categorization with image recognition, NLP, and predictive analytics with Actowiz Solutions.
Actowiz Solutions' report unveils 2024 Black Friday grocery discounts, highlighting key pricing trends and insights to help businesses & shoppers save smarter.
This report explores women's fashion trends and pricing strategies in luxury clothing by analyzing data extracted from Gucci's website.
Discover how Actowiz Solutions' AI-Powered Web Scraping optimized a streaming platform’s content strategy through advanced Social Media Sentiment Analysis.
Discover how Actowiz Solutions leverages AI-driven web scraping to transform real estate market predictions. Gain insights into pricing trends and smarter investments.
Discover how LLMs compare to web scraping in data extraction. Explore their potential, limitations, and impact on the future of data collection.
Actowiz Solutions empowers businesses by scraping travel price data, enabling accurate comparisons to help users discover the best deals effortlessly.