Start Your Project with Us

Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.

  • Any feature, you ask, we develop
  • 24x7 support worldwide
  • Real-time performance dashboard
  • Complete transparency
  • Dedicated account manager
  • Customized solutions to fulfill data scraping goals
Careers

For job seekers, please visit our Career Page or send your resume to hr@actowizsolutions.com

Scraping-JavaScript-Intensive-Websites-Like-an-Expert-Using-Python

JavaScript has become ubiquitous on the web, posing a challenge for web scraping. While most data is easily accessible in the HTML of a page, there are instances where the data is only available after the JavaScript is executed and rendered. This complicates the scraping process.

In previous articles, we have covered web scraping using user-friendly Python libraries. However, these methods need to be improved when dealing with JavaScript rendering. We need to employ more advanced tools and techniques to tackle websites that hide their data behind JavaScript rendering.

Best Tool for this Job

Normally, we would suggest some go-to libraries to do web scraping:

BeautifulSoup

Requests

While the previously mentioned tools can handle various tasks, they cannot render JavaScript. Fortunately, there is a dedicated category of tools designed specifically for this purpose called Browser Automation tools.

Browser Automation tools are built to simulate and automate the web browsing experience, allowing tasks to be executed at intervals or speeds surpassing human capabilities. While these tools are commonly used for website testing by owners, they also offer the functionality required to render JavaScript and scrape the underlying data.

These tools provide:

A comprehensive solution.

Combining JavaScript rendering capabilities with web scraping functionality.

Making them ideal for extracting data from JavaScript-intensive websites.

By leveraging Browser Automation tools, you can effectively overcome the challenge of scraping websites that rely heavily on JavaScript.

Among the popular tools are:

  • Scrapy
  • Playwright
  • Selenium

For the purpose of this example, let's delve into using Selenium. Additionally, we will utilize the reliable BeautifulSoup library to parse the response and extract the desired data.

Set up a Workspace

To fully automate a web browser, some additional setup is necessary beyond the basic installation of libraries. We will need to install the following components:

Chrome Browser (or any other web browser of your choice, but we will use Chrome for this example).

ChromeDriver: This is the web driver specific to Chrome that allows interaction with the browser programmatically.

To follow along, you can install Chrome on your system. As for ChromeDriver installation, we can use a convenient Python library that handles the installation for us.

With that in mind, let's proceed with installing the required libraries.

With-that-in-mind

Once they get installed, we could start importing:

Once-they-get-installed

To simplify the installation process, we can use the chromedriver_autoinstaller library, which automatically installs ChromeDriver and adds it to the system's PATH if it's not already present. This saves us some effort and can be achieved with a single line of code:

By executing this code, the library will handle the installation of ChromeDriver seamlessly.

By-executing-this-code

Here's a summary of the steps we've taken to set up our environment. Assuming you have already installed Python, you can use pip to install the necessary libraries. After installing selenium, bs4, and chromedriver-autoinstaller, your Python file should look something like this:

Here-s-a-summary-of-the-steps

Get a Webpage

Now that we have our environment set up, we can start making web page requests. To accomplish this, we need to configure the WebDriver object that Selenium will use. Here's an example of how you can set it up:

Get-a-Webpage

Now we can instruct the webdriver to retrieve a web page. For this example, we'll scrape Rotten Tomatoes Certified Fresh Movies.

Although the data we want (movie titles, ratings, etc.) can be obtained without rendering JavaScript, it's much easier to parse when it's rendered.

This page heavily relies on JavaScript, as shown in the JavaScript-enabled site:

This-page-heavily-relies-on

And using JavaScript disabled:

And-using-JavaScript-disabled

We can ask for this web page using driver object “get” method

We-can-ask-for-this-web-page

And we could get an html output with a page_source attribute:

And-we-could-get-an-html

Just to do recap, let’s go through the code:

Just-to-recap-here-is-where

Parse the HTML

Now, Selenium can handle parsing the data, but in most cases, we'll rely on BeautifulSoup for parsing the HTML. Let's create a BeautifulSoup object from the page source:

Parsing-the-HTML

Now, we need to have something which will look like this:

Now-we-need-to-have-something

If we were to print the soup object we've created, we would see the entire web page, excluding some of the fancy formatting. Fortunately, in this case, we don't have to wait for JavaScript execution.

However, in some scenarios, we may need to wait for JavaScript execution. This can be achieved either through Implicit Waiting or Explicit Waiting.

Since we don't need to worry about that here, let's focus on finding the information we're interested in:

Since-we-don-t-need-to-worry

It appears that all the movies we're interested in are contained within div elements with the class "mb-movie". Each of these divs contains information about an individual movie.

To extract the relevant information, we can use BeautifulSoup's find_all() method with the appropriate parameters:

To-extract-the-relevant-information

We can have each of them and find a title, release date, and score and easily using BeautifulSoup:

We-can-have-each-of-them

Conclusion

In summary, we have accomplished several tasks in a short period of time:

Installed Chrome and ChromeDriver.

Used a Python library to install ChromeDriver automatically.

Fetched a web page that heavily relies on JavaScript using Selenium.

Parsed and extracted data from the web page using BeautifulSoup.

Here's a final overview of our progress, with the data printed in the terminal:

Conclusion

For more information, contact Actowiz Solutions now! Call us for all your mobile app scaping and data collection service requirements.

RECENT BLOGS

View More

What Makes Web Scraping for FMCG Price Tracking a Game-Changer?

Web Scraping for FMCG Price Tracking offers real-time data, competitive insights, and pricing trends, helping businesses optimize strategies and boost profits.

How AI, ML, and Web Scraping are Transforming Grocery Product Categorization?

Discover how AI, ML, and Web Scraping optimize grocery categorization with image recognition, NLP, and predictive analytics with Actowiz Solutions.

RESEARCH AND REPORTS

View More

Research Report - Grocery Discounts This Black Friday 2024: Actowiz Solutions Reveals Key Pricing Trends and Insights

Actowiz Solutions' report unveils 2024 Black Friday grocery discounts, highlighting key pricing trends and insights to help businesses & shoppers save smarter.

Analyzing Women's Fashion Trends and Pricing Strategies Through Web Scraping Gucci Data

This report explores women's fashion trends and pricing strategies in luxury clothing by analyzing data extracted from Gucci's website.

Case Studies

View More

Social Media Sentiment Analysis - AI-Powered Web Scraping for a Streaming Platform

Discover how Actowiz Solutions' AI-Powered Web Scraping optimized a streaming platform’s content strategy through advanced Social Media Sentiment Analysis.

Case Study - Analyzing Market Trends – AI Web Scraping for Real Estate Price Predictions

Discover how Actowiz Solutions leverages AI-driven web scraping to transform real estate market predictions. Gain insights into pricing trends and smarter investments.

Infographics

View More

Can LLMs Take the Place of Web Scraping

Discover how LLMs compare to web scraping in data extraction. Explore their potential, limitations, and impact on the future of data collection.

Travel Price Comparison - Unlock the Best Deals with Data

Actowiz Solutions empowers businesses by scraping travel price data, enabling accurate comparisons to help users discover the best deals effortlessly.