Start Your Project with Us

Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.

  • Any feature, you ask, we develop
  • 24x7 support worldwide
  • Real-time performance dashboard
  • Complete transparency
  • Dedicated account manager
  • Customized solutions to fulfill data scraping goals
Careers

For job seekers, please visit our Career Page or send your resume to hr@actowizsolutions.com

How-to-Scrape-E-Commerce-Website-Data-to-Compare-Prices-Using-Python-Part-2

In Part 1 of the two-part series on data scraping e-commerce websites for price comparison, we used the Selenium-Python package to automate the procedure of extracting product prices and names from the Lazada website.

In Part 2 here, we will continue scraping on the Shopee website. Here we will concentrate on particular challenges with extracting the Shopee website rather than repeating the steps in Part I. We will also introduce a substitute to Selenium that worked better!

So, let’s begin!

Scraping the Shopee website

wasn’t easy while using a Selenium tool, and we have highlighted four extra complexities a Shopee website had and a Lazada website hadn’t:

Using-Selenium

Popup Alerts (Extra Complexity = Low)The initial issue we meet is popup alerts, which come when you search:

Popup-Alerts-Extra-Complexity-Low

We can automate clicking away from popup boxes using Selenium with the given script:

We-can-automate

2. Different Prices for the Similar Item (Extra Complexity = Low)

Different-Prices-for-the-Similar-Item-Extra-Complexity-Low

We also get that at times in Shopee search outcomes, one item might have two different pricing figures with a similar class name. Different prices imitate a pricing range where an item has a volume discount:

Using Selenium, we can stipulate the particular figure we need by using an XPath selector to choose the second span component that reflects the initial figure:

3. Search Gives 50 Items Every Page in Search But Merely 15 Got Selected (Extra Complexity = High)

Search-Gives

The Shopee site is a dynamic site, where page elements look dynamically only while scrolling down a page. It isn’t unusual because it helps a page in loading quicker without immediately loading all elements (Facebook works in the similar way).

However, this needs to automate scrolling to bottom of a page like you would do physically, with shorter waiting time for all page fundamentals to appear.

However-this-needs-to-automate

Also, Selenium allows automation to do browser scrolling, however the script for the particular automation could be lengthy because you might need to imitate the manual procedure of scrolling a bit more, and wait a few seconds for page elements to come, rinse and repeat till you reach end of a page.

We could write the script like this:

Here we can see that the code has become much more composite, and the automation procedure has also become slower with extra pause times.

Here-we-can-see

4. The Product Name Elements Just Can’t Be Chosen

As observed earlier, the product names can’t be selected although they could be recognized with either XPath or class selectors and could be seen with a Chrome inspect tool. Due to that, running find_element doesn’t reoccurrence the anticipated item names, only empty strings.

We’ll have to write a few Javascript codes to deploy a CSS property, the language we are extremely unfamiliar with.

Fortunately, we found an easy way of scraping Shopee data: using Shopee’s API to ask for search results.

We were extremely lucky to find that on the web. Not all the websites will get or will share the API with you. Because Shopee helps you use the API to extract product information directly, it becomes much easier to utilize that rather than automating the extraction procedure with Selenium using the given code:

We-were-extremely-lucky

Now, we will make a pandas dataframe for organizing all the data:

Now-we-will-make-a

Printing output data of a dataframe offers the given results:

Printing-output-data Printing-output-data-of-02

With a Lazada dataset, we would also require to conduct cleaning with the dataset. The key things we have to do include:

With-a-Lazada-dataset
  • 1. Transforming the pricing column from integer types into two-decimal float types
  • 2. Removing unrelated entries from a dataset
  • 3. Removing twin packs

Now, it’s time to combine Shopee and Lazada datasets! We do that by utilizing a pandas concatenation technique:

Now we need to compare between these two platforms. We could print a dataframe statistical structures using a describe method:

Now-we-need

We would plot data using similar box plot created in the Part 1:

We-would-plot-data

And that’s it! Depending on one item comparison, it does look that Shopee is the cheaper platform (having extra items).

Some notes before we finish off:

Some-notes-before-we-finish

a) It’s useful to organize a price comparison between various time periods to analyze a pricing trend of any particular item. To do that, we could add a datetime column as well as save this to the csv file.

b) Though you can extract other items just by changing a keyword_search variable, you might have to clean a dataset otherwise from the given example.

c) This example is the small dataset, and so the cleaning and scraping exercise was much quicker.

That’s it for now!

For more information about scraping e-commerce website data to compare prices using Python, contact Actowiz Solutions now!

You can also reach us for all your mobile app scraping and web scraping services requirements.

RECENT BLOGS

View More

What Makes Web Scraping for FMCG Price Tracking a Game-Changer?

Web Scraping for FMCG Price Tracking offers real-time data, competitive insights, and pricing trends, helping businesses optimize strategies and boost profits.

How AI, ML, and Web Scraping are Transforming Grocery Product Categorization?

Discover how AI, ML, and Web Scraping optimize grocery categorization with image recognition, NLP, and predictive analytics with Actowiz Solutions.

RESEARCH AND REPORTS

View More

Research Report - Grocery Discounts This Black Friday 2024: Actowiz Solutions Reveals Key Pricing Trends and Insights

Actowiz Solutions' report unveils 2024 Black Friday grocery discounts, highlighting key pricing trends and insights to help businesses & shoppers save smarter.

Analyzing Women's Fashion Trends and Pricing Strategies Through Web Scraping Gucci Data

This report explores women's fashion trends and pricing strategies in luxury clothing by analyzing data extracted from Gucci's website.

Case Studies

View More

Social Media Sentiment Analysis - AI-Powered Web Scraping for a Streaming Platform

Discover how Actowiz Solutions' AI-Powered Web Scraping optimized a streaming platform’s content strategy through advanced Social Media Sentiment Analysis.

Case Study - Analyzing Market Trends – AI Web Scraping for Real Estate Price Predictions

Discover how Actowiz Solutions leverages AI-driven web scraping to transform real estate market predictions. Gain insights into pricing trends and smarter investments.

Infographics

View More

Can LLMs Take the Place of Web Scraping

Discover how LLMs compare to web scraping in data extraction. Explore their potential, limitations, and impact on the future of data collection.

Travel Price Comparison - Unlock the Best Deals with Data

Actowiz Solutions empowers businesses by scraping travel price data, enabling accurate comparisons to help users discover the best deals effortlessly.