Start Your Project with Us

Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.

  • Any feature, you ask, we develop
  • 24x7 support worldwide
  • Real-time performance dashboard
  • Complete transparency
  • Dedicated account manager
  • Customized solutions to fulfill data scraping goals
Careers

For job seekers, please visit our Career Page or send your resume to hr@actowizsolutions.com

How-to-Scrape-E-Commerce-Data-from-Tokopedia-using-Web-Scraping

Data has become a mighty weapon that can influence the direction of this world. This can decide the subsequent actions, which need to consider increasing sales by offering products related to customers’ tastes, using Artificial Intelligence to minimize human work, and more.

This blog will show how to scrape data from a current website; the action is generally called data scraping. For that one, we would use Tokopedia, an Indonesian E-Commerce website.

The initial step to data scraping is deciding which data we wish to get. Here, we need to find shoe (sepatu) data, and this would be organized by a review (ulasan).

Let’s observe the site. This website is formed by the markup language called HTML. And we could get the data that we want by searching the HTML of a page carefully.

Initially, let’s open the Tokopedia page at https://www.tokopedia.com.

Initially-lets-open-the-Tokopedia-page-at

Let’s search for the shoes on a search bar. In Indonesia, shoes are known as “sepatu”, so we will use a word “sepatu” in a search bar.

Lets-search-for-the-shoes-on-a-search-bar

However, it’s organized by the most appropriate one, therefore, let’s change it to category by a review by changing a dropdown “Urutkan” to “Ulasan”.

However--its-organized-by-the-most-appropriate-one

Let’s observe an HTML by utilizing inspect elements or point towards a product’s card.

Let-s-observe-an-HTML-by-utilizing-inspect

We can observe that a card has the class called css-y5gcsw. Then within a card, we could see some data about products.

We are interested in a name, pricing, city, and image URLs of products so let’s see an HTML element of the data.

We-are-interested-in-a-name,-pricing,-city,-and-image-URLs

We can observe that we can have a name using css-1b6t4dn class, pricing with the css-1ksb19c class, a city using the css-1kdc32b class, and an image having a css-1c345mg class.

After identifying the HTML of this page, let’s make a script for getting data from a page.

As Tokopedia uses JavaScript Framework to build a website, we would use the browser automation library called Selenium. We could get data from HTML using the library. Indeed, you have to install a library initially, and we want a browser, also. You could follow the Selenium installation at the link and use the virtual environment of Python for the project. For a browser, we would be utilizing Firefox for the automation procedure.

After that, it’s time to make a file called scraper.py like a place for a Scraper to reside.

Let’s make a class called Scraper, which will get the responsibility of getting data from a website. Here, we make a property called driver, which will get filled with the Selenium Webdriver. A Webdriver is the class Selenium will utilize to create a session having a browser and connecting with a browser. Therefore, if a webdriver commands a browser to open any page, the page will get opened in a browser. To make a Webdriver object connected to the Firefox browser, we could call a static function Firefox() from a Webdriver class.

Lets-make-a-class-called-Scraper

After that, let’s make a function called get_data() to find data from a website. For the objective here, we require to get an URL from a website. In case, we observe that website again, we could see an URL is :

After-that-lets-make-a-function-called

Let’s create a driver command a browser to find the URL through calling the function driver.get("URL").

After that, Just make a counter for a page, which shows products and listing to place the data.

After-that-Just-make

We would get data till page 10. For every page, we would make a driver command the browser for scrolling till the end of a page as the page would not load data in case, we didn’t scroll using it. When we checked the page, we found that a page has about 6500 pixels and we would scroll every 500 pixels. For every iteration, we would wait for 0.1 seconds thus we didn’t put any load together on the server.

We-would-get-data-till-page-10

After the repetition for scrolling, we would get a card’s element, iterator on all elements, find the name, pricing, image, and city data, and lastly put data to a datas variable.

After-the-repetition-for-scrolling

Then, we find all data, we could go to next page through making a driver click to next page. In case, we check HTML of a page, we could find that a page button gets css-1ix4b60-unf-pagination-item class. And we could indicate which button is needed to click through using a counter variable.

Then-we-find-all-data

And lastly, return data like a function’s return values.

And-lastly,-return-data-like

For overall codes, just check this.

For-overall-codes-just-check-this

Now, let’s make a file called “main.py” for checking a class functionality. Just fill a file using this code.

If we run a file, we would open a Firefox browser, and a browser would automatically search as a driver instructed within our code. After that, we can observe the results from a terminal.

We could see that we found 700 product data from a shoe-searching page!!!

Then, we would try and present data in an additional format than printing in a terminal directly.

For more information, contact Actowiz Solutions now!

You can also reach for all your mobile app scraping and web scraping services requirements.

RECENT BLOGS

View More

What Makes Web Scraping for FMCG Price Tracking a Game-Changer?

Web Scraping for FMCG Price Tracking offers real-time data, competitive insights, and pricing trends, helping businesses optimize strategies and boost profits.

How AI, ML, and Web Scraping are Transforming Grocery Product Categorization?

Discover how AI, ML, and Web Scraping optimize grocery categorization with image recognition, NLP, and predictive analytics with Actowiz Solutions.

RESEARCH AND REPORTS

View More

Research Report - Grocery Discounts This Black Friday 2024: Actowiz Solutions Reveals Key Pricing Trends and Insights

Actowiz Solutions' report unveils 2024 Black Friday grocery discounts, highlighting key pricing trends and insights to help businesses & shoppers save smarter.

Analyzing Women's Fashion Trends and Pricing Strategies Through Web Scraping Gucci Data

This report explores women's fashion trends and pricing strategies in luxury clothing by analyzing data extracted from Gucci's website.

Case Studies

View More

Social Media Sentiment Analysis - AI-Powered Web Scraping for a Streaming Platform

Discover how Actowiz Solutions' AI-Powered Web Scraping optimized a streaming platform’s content strategy through advanced Social Media Sentiment Analysis.

Case Study - Analyzing Market Trends – AI Web Scraping for Real Estate Price Predictions

Discover how Actowiz Solutions leverages AI-driven web scraping to transform real estate market predictions. Gain insights into pricing trends and smarter investments.

Infographics

View More

Can LLMs Take the Place of Web Scraping

Discover how LLMs compare to web scraping in data extraction. Explore their potential, limitations, and impact on the future of data collection.

Travel Price Comparison - Unlock the Best Deals with Data

Actowiz Solutions empowers businesses by scraping travel price data, enabling accurate comparisons to help users discover the best deals effortlessly.