How to Automate Data Collection From Sites

KAMELEO.IO > Blog  > Guides  > How to Automate Data Collection From Sites

How to Automate Data Collection From Sites

Do you want to automate data collection from sites, whether looking to collect what the top articles on Google say, looking to see how the top-ranking paragraphs of a certain posts rank, or simply for the top-ranking titles on a specific search on Google?

In this guide, we show you how to do exactly that, even without much coding experience.

What’s Needed to Automate Data Collection From Sites

Frequent automation of data collection is going to require for you to be anonymous. So that the likes of Google don’t start seeing you as suspicious.

For that, you will need Kameleo. Kameleo will change your browser fingerprint so that sites that are collecting data about you get the wrong data. Meaning you will be able to carry out data collection in the long-term without been blacklisted, or marked as suspicious. 

How to Automate Data Collection From Sites

If you don’t know what browser fingerprinting is, we covered it in several posts in the past. But to sum up, briefly, 1 in 286,777 browsers are unique meaning it’s incredibly easy for websites and platforms like Google to identify you. Especially when you are carrying out many operations at high speeds.

And it’s things as simple as your screen resolution and browser version that all build a browser fingerprint that’s so unique that only 1 in 286,777 people have the same one.

Kameleo will protect you from that side of things, which is important if you plan to automate your activities, but you also need to change your IP address, which is a part of browser fingerprinting, And for that, you need either a VPN or a proxy.

Lastly, as far as the actual automation part, you will need to either install Selenium for your coding platform if you already have some coding platform. Or you can otherwise use UI.Vision which is both a Google Chrome and Firefox extension, which allows you to automate data collection from sites without any coding experience. How? As you can pretty much record your activities and turn them into code.

Automate Data Collection From Sites – The Staying Anonymous Part

After you’ve installed Kameleo, power it up and set up your new profile adjusting the settings to your liking. The great thing about the Kameleo platform is that whenever you drag your mouse over something you aren’t sure about, it will be explained to you.

At this point, it’s also important to connect to a proxy directly through Kameleo. If you rock a VPN, you can go into the proxy settings of your provider for this. Alternatively, just launch your VPN provider and connect.

Here’s a video where we show you how to spoof your browser fingerprint:

Automating Data

Once you have Kameleo turned on, and a new browser window opens, download UI.Vision. And then press on the extension, which should be visible on the top-right of your browser. 

If you are looking to simply collect titles of top-ranking blog posts, we already made a blog on that that dives into the specifics of that task, but in this example, we’ll show you how we capture screenshots of how the top-ranking blog posts on a desired search.

Here’s how to do it:

There are two ways to go about it. You can either automate the browser going to a specific site or you can open up the site you want to access manually. We prefer typing it in manually as that allows us not to have to change the code every time we want to automate this process.

So first, go to the site you desire to use for automating data collection from sites.

How to Automate Data Collection From Sites

In our case, we wanted to see what sites with the phrase “Disney+ Europe” looked like so we Googled that phrase on Google. If you were to get UI.Vision to do this for us, we would press on Add, followed by adding an Open command with the URL of the search we want in the target area.

Next, you will need to add a Click command. At the right of the target area, you will see the Select option. Click on it, and your browser, select the first proper header that’s not an ad that comes up.

How to Automate Data Collection From Sites, site automation, data automation

Save what the target area now states, and repeat the Select task.

Now compare the two pieces of code that you get.

In our case, this is what the pieces of code stated:
  1. xpath=//*[@id=”rso”]/div[3]/div/div[1]/div/div/div[1]/a/h3
  2. xpath=//*[@id=”rso”]/div[3]/div/div[2]/div/div/div[1]/a/h3

That right away gave us information that the difference between headers is the 3rd last number from the right.

We are now going to replace that number with a: ${!LOOP}

The code will now read: xpath=//*[@id=”rso”]/div[3]/div/div[${!LOOP}]/div/div/div[1]/a/h3

This will allow for the automation of clicking on different headers.

Next, add a captureEntirePageScreenshot with the name of what you want the name of the file to be under the target area. 

Since we are going to be saving a bunch of screenshots, we are going to add a ${!LOOP} to the end of our name.

In our instance, we simply named the file: hellohi${!LOOP} so that every time a screenshot was captured, it would be called hellohi1, hellohi2, hellohi3, and so on.

At the end of the line, we added a Pause command with 3000 in the target area. This will result in a 3-second pause between the execution of the next action.

Just like that, you have automated data collection from sites.

Lastly, just press on Play Loop which is located on the right of Play Macro.

That’s How to Automate Data Collection from Sites

UI.Vision is an incredible platform for automation of data collection while being incredibly easy to use, as you saw in this simple guide. Just like that, you can automate the collection of screenshots. But imagine you just experimented with their commands, which they explain very well. With a few days of learning on YouTube and forums and the help of Kameleo for not being recognized as suspicious or a bot, the possibilities are endless.

Kameleo Team
Kameleo Team

Our team consists of IT security experts, professional developers, and privacy enthusiasts who always searching better ways for browser fingerprint protection and developing innovative tools for browser automation and web scraping.