Category-wise packs with monthly refresh; export as CSV, ISON, or Parquet.
Pick cities/countries and fields; we deliver a tailored extract with OA.
Launch instantly with ready-made scrapers tailored for popular platforms. Extract clean, structured data without building from scratch.
Access real-time, structured data through scalable REST APIs. Integrate seamlessly into your workflows for faster insights and automation.
Download sample datasets with product titles, price, stock, and reviews data. Explore Q4-ready insights to test, analyze, and power smarter business strategies.
Playbook to win the digital shelf. Learn how brands & retailers can track prices, monitor stock, boost visibility, and drive conversions with actionable data insights.
We deliver innovative solutions, empowering businesses to grow, adapt, and succeed globally.
Collaborating with industry leaders to provide reliable, scalable, and cutting-edge solutions.
Find clear, concise answers to all your questions about our services, solutions, and business support.
Our talented, dedicated team members bring expertise and innovation to deliver quality work.
Creating working prototypes to validate ideas and accelerate overall business innovation quickly.
Connect to explore services, request demos, or discuss opportunities for business growth.
GeoIp2\Model\City Object ( [raw:protected] => Array ( [city] => Array ( [geoname_id] => 4509177 [names] => Array ( [de] => Columbus [en] => Columbus [es] => Columbus [fr] => Columbus [ja] => コロンバス [pt-BR] => Columbus [ru] => Колумбус [zh-CN] => 哥伦布 ) ) [continent] => Array ( [code] => NA [geoname_id] => 6255149 [names] => Array ( [de] => Nordamerika [en] => North America [es] => Norteamérica [fr] => Amérique du Nord [ja] => 北アメリカ [pt-BR] => América do Norte [ru] => Северная Америка [zh-CN] => 北美洲 ) ) [country] => Array ( [geoname_id] => 6252001 [iso_code] => US [names] => Array ( [de] => USA [en] => United States [es] => Estados Unidos [fr] => États Unis [ja] => アメリカ [pt-BR] => EUA [ru] => США [zh-CN] => 美国 ) ) [location] => Array ( [accuracy_radius] => 20 [latitude] => 39.9625 [longitude] => -83.0061 [metro_code] => 535 [time_zone] => America/New_York ) [postal] => Array ( [code] => 43215 ) [registered_country] => Array ( [geoname_id] => 6252001 [iso_code] => US [names] => Array ( [de] => USA [en] => United States [es] => Estados Unidos [fr] => États Unis [ja] => アメリカ [pt-BR] => EUA [ru] => США [zh-CN] => 美国 ) ) [subdivisions] => Array ( [0] => Array ( [geoname_id] => 5165418 [iso_code] => OH [names] => Array ( [de] => Ohio [en] => Ohio [es] => Ohio [fr] => Ohio [ja] => オハイオ州 [pt-BR] => Ohio [ru] => Огайо [zh-CN] => 俄亥俄州 ) ) ) [traits] => Array ( [ip_address] => 216.73.216.24 [prefix_len] => 22 ) ) [continent:protected] => GeoIp2\Record\Continent Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [code] => NA [geoname_id] => 6255149 [names] => Array ( [de] => Nordamerika [en] => North America [es] => Norteamérica [fr] => Amérique du Nord [ja] => 北アメリカ [pt-BR] => América do Norte [ru] => Северная Америка [zh-CN] => 北美洲 ) ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => code [1] => geonameId [2] => names ) ) [country:protected] => GeoIp2\Record\Country Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [geoname_id] => 6252001 [iso_code] => US [names] => Array ( [de] => USA [en] => United States [es] => Estados Unidos [fr] => États Unis [ja] => アメリカ [pt-BR] => EUA [ru] => США [zh-CN] => 美国 ) ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => confidence [1] => geonameId [2] => isInEuropeanUnion [3] => isoCode [4] => names ) ) [locales:protected] => Array ( [0] => en ) [maxmind:protected] => GeoIp2\Record\MaxMind Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( ) [validAttributes:protected] => Array ( [0] => queriesRemaining ) ) [registeredCountry:protected] => GeoIp2\Record\Country Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [geoname_id] => 6252001 [iso_code] => US [names] => Array ( [de] => USA [en] => United States [es] => Estados Unidos [fr] => États Unis [ja] => アメリカ [pt-BR] => EUA [ru] => США [zh-CN] => 美国 ) ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => confidence [1] => geonameId [2] => isInEuropeanUnion [3] => isoCode [4] => names ) ) [representedCountry:protected] => GeoIp2\Record\RepresentedCountry Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => confidence [1] => geonameId [2] => isInEuropeanUnion [3] => isoCode [4] => names [5] => type ) ) [traits:protected] => GeoIp2\Record\Traits Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [ip_address] => 216.73.216.24 [prefix_len] => 22 [network] => 216.73.216.0/22 ) [validAttributes:protected] => Array ( [0] => autonomousSystemNumber [1] => autonomousSystemOrganization [2] => connectionType [3] => domain [4] => ipAddress [5] => isAnonymous [6] => isAnonymousProxy [7] => isAnonymousVpn [8] => isHostingProvider [9] => isLegitimateProxy [10] => isp [11] => isPublicProxy [12] => isResidentialProxy [13] => isSatelliteProvider [14] => isTorExitNode [15] => mobileCountryCode [16] => mobileNetworkCode [17] => network [18] => organization [19] => staticIpScore [20] => userCount [21] => userType ) ) [city:protected] => GeoIp2\Record\City Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [geoname_id] => 4509177 [names] => Array ( [de] => Columbus [en] => Columbus [es] => Columbus [fr] => Columbus [ja] => コロンバス [pt-BR] => Columbus [ru] => Колумбус [zh-CN] => 哥伦布 ) ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => confidence [1] => geonameId [2] => names ) ) [location:protected] => GeoIp2\Record\Location Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [accuracy_radius] => 20 [latitude] => 39.9625 [longitude] => -83.0061 [metro_code] => 535 [time_zone] => America/New_York ) [validAttributes:protected] => Array ( [0] => averageIncome [1] => accuracyRadius [2] => latitude [3] => longitude [4] => metroCode [5] => populationDensity [6] => postalCode [7] => postalConfidence [8] => timeZone ) ) [postal:protected] => GeoIp2\Record\Postal Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [code] => 43215 ) [validAttributes:protected] => Array ( [0] => code [1] => confidence ) ) [subdivisions:protected] => Array ( [0] => GeoIp2\Record\Subdivision Object ( [record:GeoIp2\Record\AbstractRecord:private] => Array ( [geoname_id] => 5165418 [iso_code] => OH [names] => Array ( [de] => Ohio [en] => Ohio [es] => Ohio [fr] => Ohio [ja] => オハイオ州 [pt-BR] => Ohio [ru] => Огайо [zh-CN] => 俄亥俄州 ) ) [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array ( [0] => en ) [validAttributes:protected] => Array ( [0] => confidence [1] => geonameId [2] => isoCode [3] => names ) ) ) )
country : United States
city : Columbus
US
Array ( [as_domain] => amazon.com [as_name] => Amazon.com, Inc. [asn] => AS16509 [continent] => North America [continent_code] => NA [country] => United States [country_code] => US )
Decathlon is a well-known retailer in the sporting goods industry, offering a wide range of products like shoes, sports apparel, and equipment. Extracting Decathlon site can offer important insights into product pricing, trends, and market details. In this blog, we will explore how to scrape apparel data from Decathlon's site by category through combination of Python and Playwright.
Playwright is an automation library that allows you to control web browsers such as Firefox, WebKit, and Chromium using programming languages like JavaScript and Python. It is an excellent tool for web scraping as it automates tasks like filling forms, clicking buttons, and scrolling. Using Playwright, we will navigate through each category on Decathlon's website and extract product information, including their name, pricing, and description.
In this blog, you'll gain a fundamental understanding of how to utilize Python and Playwright to extract data from Decathlon's site through category. We'll scrape different data attributes from separate product pages including Product URL, Brand, MRP, Product Name, Number of Reviews, Sale Price, Color, Features, Ratings, and Product Information.
Let’s go through a step-by-step blog guide for using Python and Playwright to extract apparel data from Decathlon by category.
To begin the process, we need to import the necessary libraries that will enable us to interact with the Decathlon website and extract the desired information. Here are the libraries we need to import:
The required libraries for scraping data from Decathlon's website by category using Playwright and Python are:
random: This library is used for generating random numbers, which can help generate test data or randomize the order of tests. While it may not directly relate to web scraping in this context, it can be helpful in other scenarios.
asyncio: This library is essential for handling asynchronous programming in Python. Since we will be using the asynchronous API of Playwright, asyncio allows us to write asynchronous code, making it easier to work with Playwright effectively.
pandas: This library is widely used for data analysis and manipulation. It provides convenient data structures and functions for handling tabular data. In this tutorial, we may use pandas to store and manipulate the data obtained from the web pages during scraping.
async_playwright: This library is the asynchronous API for Playwright, which is used to automate browser testing. With the asynchronous API, you can perform multiple operations concurrently, making your scraping tasks faster and more efficient.
By importing these libraries, you'll have the tools to automate browser interactions, handle asynchronous operations, and store and analyze scraped data from Decathlon's website using Playwright and Python.
The next step is to extract the URLs of the apparel products based on their respective categories. We will navigate through the Decathlon website and collect the URLs for each product in each category.
To extract the product URLs from a web page, we can define a Python function called get_product_urls. This function will utilize the Playwright library to automate browser testing and extract the resulting product URLs.
In this function, we start by initializing an empty list called product_urls. We use page.query_selector_all to find all the elements on the page that contain the product URLs, using the CSS selector a.product-link. We then iterate through each element and extract the href attribute, which contains the URL of the product page. The extracted URLs are appended to the product_urls list.
Next, we check if there is a "next" button on the page using page.query_selector. If a "next" button exists, we click on it using next_button.click(), and then recursively call the get_product_urls function to extract URLs from the next page. The URLs extracted from the subsequent pages are also appended to the product_urls list.
Finally, we return the product_urls list, which will contain all the extracted product URLs from the web page.
You can call this function within the previous scrape_urls function, passing the browser and page instances, to extract the product URLs for each category.
To scrape product URLs depending on a product category, we have to click on a product category button for expanding the list of available categories. Then, we will click on each category to filter the products and extract the corresponding URLs.
In the previous step, we filtered the products by category and obtained the respective product URLs. Now, we will proceed to extract the names of the products from the web pages.
To extract the product names, we can define a function called extract_product_names. This function will take the page instance as a parameter and utilize Playwright's query selector to identify the elements containing the product names. We will then iterate through these elements and extract the text content, which represents the product names.
In this function, we start by initializing an empty list called product_names. We use page.query_selector_all with the CSS selector .product-name to find all elements on the page that contain the product names. Then, we iterate through each element and extract the text content using element.text_content(). The extracted names are appended to the product_names list.
You can call this function after navigating to a specific product page within your scraping script. By doing so, you will obtain a list of the product names from the page, which can be further processed or stored as needed.
In the previous step, we defined a function called extract_product_names to extract the names of the products from the web pages.
In this function, we attempt to find the corresponding product name element on the page using page.query_selector and passing the appropriate CSS selector, which is .product-name. If the element is found, we retrieve the text content using name_element.text_content() and append it to the product_names list. If the element is not found or if an exception occurs during the process, we append the string "Not Available" to the product_names list.
By handling potential exceptions and setting a default value of "Not Available" when the product name element is not found, we ensure that the function runs smoothly and provides a meaningful result even in case of unexpected situations.
You can call this function within your scraping script after navigating to the product page to extract the name of the product.
The next step is scraping of a brand of products from different web pages.
In the previous steps, we have seen how to extract the product name and brand from web pages. Now, let's move on to extracting other attributes such as sale price, MRP, ratings, color, total reviews, product information, and features.
For each of these attributes, we can define separate functions that use the query_selector method and text_content method, or similar methods, to select the relevant element on the page and extract the desired information. These functions will follow a similar structure as the extract_product_names and extract_product_brands functions.
In these functions, we attempt to locate the corresponding elements using appropriate CSS selectors for each attribute. If the element is found, we extract its text content and assign it to the respective variable. If an exception occurs during the process, we set the variable to an empty string or any default value that suits your needs.
You can call these functions within your scraping script, after navigating to the product page, to extract the desired attributes for each product.
Remember to adjust the CSS selectors used in these functions based on the structure of the web page you are scraping. Inspecting the HTML structure of the web page will help you identify the appropriate selectors for each attribute.
Certainly! Here's an updated explanation for the function that extracts the product description section from a Decathlon product page:
In this function, we attempt to locate the element containing the product description using the CSS selector .product-description. If the element is found, we extract its text content and assign it to the description variable.
To remove unwanted characters, we split the description text by the newline character (\n) and use a list comprehension to filter out any elements that are empty or contain only whitespace. We then use the strip() method to remove leading and trailing spaces from each remaining element. The resulting list of strings represents the cleaned product description.
By using this function in your scraping script, you can extract the product description from each product page and store it as a list of strings for further processing or analysis.
The get_product_information function is an asynchronous function that takes a page object as its parameter. It aims to scrape product details from Decathlon's website.
Here's an explanation of the function's logic:
By using this function, you can extract the product information from each product page on Decathlon's website, providing valuable insights and data for further analysis or processing.
Implementing a retry mechanism is an important aspect of web scraping as it helps handle temporary network errors or unexpected responses from the website. The goal is to increase the chances of success by sending the request again if it fails initially.
In this script, a retry mechanism is implemented before navigating to a URL. It uses a while loop that continuously tries to navigate to the URL until either the request gets succeed or maximum retries has been tried. If maximum retries are reached without a successful request, the script increases an exception to handle the failure.
This function is particularly useful when scraping web pages because requests may occasionally time out or fail due to network issues. By incorporating a retry mechanism, you improve the reliability of the scraping process and increase the likelihood of obtaining the desired data.
The provided function is responsible for making a request of a particular link using goto method about page object from Playwright library. It incorporates a retry mechanism in case the initial request fails.
Here's an updated explanation of the function:
The perform_request_with_retry function performs a request to a specific link by utilizing the goto method of the page object. It implements a retry mechanism by allowing the request to be retried up to a maximum number of times described by MAX_RETRIES constant.
If the initial request fails, the function retries the request by entering a while loop. Between each retry, the function introduces a random wait time using the asyncio.sleep method. This random wait duration, ranging from 1 to 5 seconds, helps prevent rapid and frequent retries that could potentially exacerbate the request failures.
The function takes two arguments: link and page. The page argument represents the Playwright page object used to perform the request, while the link argument denotes the URL to which the request is made.
By utilizing this function, you can ensure that requests are retried in case of temporary failures, improving the overall reliability of your web scraping process.
In the following step, we call functions as well as save data to empty lists.
The provided Python script demonstrates the usage of an asynchronous function named "main" to scrape product information from Amazon pages. It leverages the Playwright library to launch a Firefox browser and navigate to the Amazon page.
The "main" function follows these steps:
1. It utilizes the "extract_product_urls" function to extract the URLs of each product from the Amazon page, storing them in a list called "product_url".
2. It then iterates through each product URL in the "product_url" list.
3. For each URL, it uses the "perform_request_with_retry" function to load the product page, making use of a retry mechanism to handle temporary failures.
4. It extracts various information such as the product name, brand, star rating, number of reviews, MRP, sale price, color, features, and product information from each product page.
5. The extracted information is stored as a tuple in a list called "data".
6. After processing every 10 product URLs, it prints a progress message.
7. Once all the product URLs have been processed, it prints a completion message.
8. The data in the "data" list is converted to a Pandas DataFrame.
9. The DataFrame is saved as a CSV file using the "to_csv" method.
10. Finally, the script closes the browser instance using the "browser.close()" statement.
To execute the script, the "main" function is called using the "asyncio.run(main())" statement, which runs the "main" function as an asynchronous coroutine.
By running this script, you can scrape product information from Amazon pages, store it in a structured format, and save it as a CSV file for further analysis or processing.
In today's competitive business landscape, having access to accurate and up-to-date data is crucial for making informed decisions. Web scraping offers a valuable solution to extract data from websites like Decathlon, providing valuable insights into market trends, pricing, and competitor analysis.
Businesses can automate collecting data from Decathlon's website using tools like Playwright and Python. This enables them to gather information such as product offerings, pricing, reviews, and more, which can be used to gain a competitive edge and drive growth.
Partnering with a reputable web scraping company like Actowiz Solutions can further enhance the benefits of web scraping. Actowiz Solutions offers tailored web scraping solutions that cater to specific business needs, providing access to comprehensive and relevant data. From product details to pricing information and customer reviews, Actowiz Solutions enables brands to understand their industry and competition deeply.
By leveraging the power of web data, businesses can make data-driven decisions, optimize their operations, and drive profitability. Actowiz Solutions can provide the necessary expertise and tools to extract valuable data, whether for product development, market research, or marketing campaigns.
If you're ready to unlock the potential of web data for your brand, reach out to Actowiz Solutions, the experts in web scraping. Contact us today to explore how web scraping can revolutionize your business and give you a competitive advantage.
For all your web scraping, mobile app scraping, or instant data scraper service needs, Actowiz Solutions is here to assist you. Our team of experts is skilled in extracting data from various sources, including websites and mobile applications. Whether you need to gather market data, competitor information, or any other specific data set, we have the expertise and tools to deliver accurate and reliable results.
✨ "1000+ Projects Delivered Globally"
⭐ "Rated 4.9/5 on Google & G2"
🔒 "Your data is secure with us. NDA available."
💬 "Average Response Time: Under 12 hours"
Look Back Analyze historical data to discover patterns, anomalies, and shifts in customer behavior.
Find Insights Use AI to connect data points and uncover market changes. Meanwhile.
Move Forward Predict demand, price shifts, and future opportunities across geographies.
Industry:
Coffee / Beverage / D2C
Result
2x Faster
Smarter product targeting
“Actowiz Solutions has been instrumental in optimizing our data scraping processes. Their services have provided us with valuable insights into our customer preferences, helping us stay ahead of the competition.”
Operations Manager, Beanly Coffee
✓ Competitive insights from multiple platforms
Real Estate
Real-time RERA insights for 20+ states
“Actowiz Solutions provided exceptional RERA Website Data Scraping Solution Service across PAN India, ensuring we received accurate and up-to-date real estate data for our analysis.”
Data Analyst, Aditya Birla Group
✓ Boosted data acquisition speed by 3×
Organic Grocery / FMCG
Improved
competitive benchmarking
“With Actowiz Solutions' data scraping, we’ve gained a clear edge in tracking product availability and pricing across various platforms. Their service has been a key to improving our market intelligence.”
Product Manager, 24Mantra Organic
✓ Real-time SKU-level tracking
Quick Commerce
Inventory Decisions
“Actowiz Solutions has greatly helped us monitor product availability from top three Quick Commerce brands. Their real-time data and accurate insights have streamlined our inventory management and decision-making process. Highly recommended!”
Aarav Shah, Senior Data Analyst, Mensa Brands
✓ 28% product availability accuracy
✓ Reduced OOS by 34% in 3 weeks
3x Faster
improvement in operational efficiency
“Actowiz Solutions' data scraping services have helped streamline our processes and improve our operational efficiency. Their expertise has provided us with actionable data to enhance our market positioning.”
Business Development Lead,Organic Tattva
✓ Weekly competitor pricing feeds
Beverage / D2C
Faster
Trend Detection
“The data scraping services offered by Actowiz Solutions have been crucial in refining our strategies. They have significantly improved our ability to analyze and respond to market trends quickly.”
Marketing Director, Sleepyowl Coffee
Boosted marketing responsiveness
Enhanced
stock tracking across SKUs
“Actowiz Solutions provided accurate Product Availability and Ranking Data Collection from 3 Quick Commerce Applications, improving our product visibility and stock management.”
Growth Analyst, TheBakersDozen.in
✓ Improved rank visibility of top products
Real results from real businesses using Actowiz Solutions
In Stock₹524
Price Drop + 12 minin 6 hrs across Lel.6
Price Drop −12 thr
Improved inventoryvisibility & planning
Actowiz's real-time scraping dashboard helps you monitor stock levels, delivery times, and price drops across Blinkit, Amazon: Zepto & more.
✔ Scraped Data: Price Insights Top-selling SKUs
"Actowiz's helped us reduce out of stock incidents by 23% within 6 weeks"
✔ Scraped Data, SKU availability, delivery time
With hourly price monitoring, we aligned promotions with competitors, drove 17%
Actionable Blogs, Real Case Studies, and Visual Data Stories -All in One Place
Discover how Scraping Consumer Preferences on Dan Murphy’s Australia reveals 5-year trends (2020–2025) across 50,000+ vodka and whiskey listings for data-driven insights.
Discover how Web Scraping Whole Foods Promotions and Discounts Data helps retailers optimize pricing strategies and gain competitive insights in grocery markets.
Track how prices of sweets, snacks, and groceries surged across Amazon Fresh, BigBasket, and JioMart during Diwali & Navratri in India with Actowiz festive price insights.
Scrape USA E-Commerce Platforms for Inventory Monitoring to uncover 5-year stock trends, product availability, and supply chain efficiency insights.
Discover how Scraping APIs for Grocery Store Price Matching helps track and compare prices across Walmart, Kroger, Aldi, and Target for 10,000+ products efficiently.
Learn how to Scrape The Whisky Exchange UK Discount Data to monitor 95% of real-time whiskey deals, track price changes, and maximize savings efficiently.
Discover how AI-Powered Real Estate Data Extraction from NoBroker tracks property trends, pricing, and market dynamics for data-driven investment decisions.
Discover how Automated Data Extraction from Sainsbury’s for Stock Monitoring enhanced product availability, reduced stockouts, and optimized supply chain efficiency.
Score big this Navratri 2025! Discover the top 5 brands offering the biggest clothing discounts and grab stylish festive outfits at unbeatable prices.
Discover the top 10 most ordered grocery items during Navratri 2025. Explore popular festive essentials for fasting, cooking, and celebrations.
Explore how Scraping Online Liquor Stores for Competitor Price Intelligence helps monitor competitor pricing, optimize margins, and gain actionable market insights.
This research report explores real-time price monitoring of Amazon and Walmart using web scraping techniques to analyze trends, pricing strategies, and market dynamics.
Benefit from the ease of collaboration with Actowiz Solutions, as our team is aligned with your preferred time zone, ensuring smooth communication and timely delivery.
Our team focuses on clear, transparent communication to ensure that every project is aligned with your goals and that you’re always informed of progress.
Actowiz Solutions adheres to the highest global standards of development, delivering exceptional solutions that consistently exceed industry expectations