< Back to webinars

How to prevent and avoid ReCaptcha

Learn about ReCaptcha prevention and how to avoid it

17 min
advanced
08-May-2019

Learn about ReCaptcha prevention and how to avoid it

Agenda

  • Understanding ReCaptcha
  • Preventing ReCaptcha using browser fingerprints and proxy manipulations
  • How to overcome ReCaptcha with Luminati proxy manager automation
  • ReCaptcha solver
  • ReCaptcha unblocker

Resource download

Solve and prevent ReCaptcha

Welcome to Luminati’s webinar on how to prevent and solve ReCaptcha. Today’s main topics are:

  • Understanding ReCaptcha
  • Preventing ReCaptcha using browser fingerprints and proxy manipulations
  • How to overcome ReCaptcha with Luminati proxy manager automation
  • ReCaptcha solver
  • ReCaptcha unblocker

The first captcha was introduced by Google as a distorted but human-readable text. Later, Google introduced ReCaptcha using images such as traffic lights, crosswalks, fire hydrants, stairs and chimneys During 2018, Google introduced ReCaptcha V2 and V3, while shutting down ReCaptcha V1. About ReCaptcha types:

  • ReCaptcha V2 requires the user to click on a checkbox.
  • Invisible ReCaptcha V2 is prompted by suspicious traffic activities, the site owner is updated and then chooses how to they would like it to be handled.
  • And last, ReCaptcha V3 which is an enhancement of invisible ReCaptcha that analyzes the user interaction with the browser and device fingerprints and then sends that score to the site owner.
The site owner when choosing how to handle suspicious activity can decide to block, cloak, require additional authentication or blacklist the IP in its entirety. What Does ReCaptcha really do? In addition to solving pictures, ReCaptcha is adding cookies, and collecting browser and the device profile known as the device fingerprints. Advanced fingerprints analyze mouse movements and even the audio signal captured from the device. What actions should be taken to prevent ReCaptcha or getting blocked? Most of the websites will first check the IP integrity which can be bypassed by using a real user IP, also known as Residential IP.

Start by choosing your network as Residential, Luminati has two types of Residential IPs, static and rotating. The rotating residential IPs are real users’ device IP’s that are rotated every time a certain IP is no longer idle. Static residential IPs are issued by an ISP for commercial use and are permanent. These IPs behave like their residential counterpart, except they do not rotate as they are only used by the customer who purchased them. For example, some of the social media and ticketing sites check the persistence of the account IP, in this case, you should use Static-residential IP. Selecting the IP type as Static residential can be done in the Luminati Dashboard, click the zones tab, Add a zone and select the Network type as Static residential. Several target websites introduce ReCaptcha on ‘sensitive’ pages such as registration pages or posting information etc.

In similar use cases, we would recommend using the waterfall solution. Start by sending requests with data center IP’s and then route the request to a residential IP when crawling the sensitive pages by creating rules in the Proxy manager. The Waterfall method gives you a high success rate and cost-effectiveness at the same time. One example of this would be to start sending requests through the Datacenter network. Then if they fail, that same request is automatically sent through the residential network, If that request fails then it is automatically routed through the mobile network. Another way to utilize the waterfall method would be by routing requests between different geo-locations. This is useful when scraping e-commerce product pages while getting ReCaptcha. You can change the IP from US for example to an IP from Europe or Canada which can also work to overcome ReCaptcha.

Let's take a close look at implementing a URL rule to rotate between different networks. Each proxy port in the Proxy manager should be allocated with a different network type. Once your ports are ready, create a new rule that will be triggered upon hitting the URL you’ve chosen in order to switch IP to residential IP. In the Action dropdown-list select ‘Retry with new proxy port’, which will initiate waterfall routing. In the Retry section using a different port drop-down list select the relevant port with the Residential zone.

Many simple target sites just analyze the browser profile such as the header and then implement a visible ReCaptcha. To easily overcome this merely send a real browser header and change the header with each request. Setting the browser header values can easily be done with the Luminati Proxy Manager, under the Headers tab found in the settings of each of the proxy ports. To utilize different user-agents, select ‘Yes’ in the Random user-agent field and a random user-agent will be added to each of your requests. You can also manually add any of the Header values such as cookies, Accept-Language, etc. by typing in the name and value according to your target websites headers. When working with the Proxy Manager API, setting the required headers can be done by creating a new proxy by sending POST request OR by updating the port via PUT request. While sending the JSON configuration file of the port. The relevant field of the configuration file is the Headers array which includes names and values of the header field. For example for a particular site, under the name field you will add ‘cookie’ and under the value field is the string for the cookie itself Using this you can include the same or different cookie values per each request and each session while populating different cookies values from cookie database that are relevant to your target website.

You can do the same in Puppeteer or Selenium by including the browser headers in your code. Another very useful proxy manipulation is resolving the DNS on the peer side and not at the super proxy side. What is DNS resolve? DNS resolves Translates from an IP to a URL The advantage of resolving DNS at the Super Proxy is a faster request. Resolving DNS on the peer side allows for greater anonymity especially when using a bot or crawler This can be done by going to the Request speed tab in the proxy manager, and selecting Remote resolve by the peer in the DNS lookup field.

Some target websites that implement ReCaptcha V3 or are utilizing sophisticated analyzing capabilities it is important to pay attention to fingerprints such as the user mouse movements, webRTC rendering, analyzing the audio signal and more. Now Audio analyzing can be overcome by adding or populating the audio signal/file to your requests. For example, you can use a text to speech tool to create the required audio and include it in your request.

When populating proper audio as part of the request we have noticed that ReCaptcha V3 is not present. Audio analyzing has even replaced Canvas image rendering on many popular sites and with our tests, we were able to create accounts on popular target websites by merely populating the audio file. For a broader understanding of what is a device fingerprint, you can watch our previous webinar on device fingerprints available at luminati.io/webinar/device-fingerprints. For generating and disabling some of the fingerprints you can use a multilogin browser profile that is able to mask and disable fingerprint properties. For example, you can set the browser time zone or geolocation to be the same as the proxy IP. Multilogin populates the proxy IP to webRTC to leak the proxy IP and not your real IP.

Other fingerprint parameters can be populated as part of a multilogin browser profile, such as canvas, static audio noise, webGL and more. How to overcome a ReCaptcha and not interrupt mass crawling operations? First start by retrying with a new IP when ReCaptcha is presented. This can be done in the Rules tab of the Proxy Manager by creating a new rule Set the trigger to ‘HTML body element’ and for ‘string to be scanned’ type in the word ReCaptcha or whatever appears in the browser console. Now Select to ‘retry with new IP’ and the number of retries you would like and last test the rule. You can learn more about the Luminati Proxy Manager on our Luminati FAQ and by watching our previous webinar Becoming a Proxy Manager Expert

Another option is to actually solve the captcha by integrating with a 3rd party captcha solver such as 2captcha and anti-captcha. These services use real people that manually solve the captcha and send you back the results. The integration with captcha solver services is cumbersome since you need to detect when the captcha is present and then implement a complex API to send the request. The main constraint of these services is the response time with an average of 40 to 60 seconds until the captcha is solved and sent back. Many customers have asked us how to unblock ReCaptcha. Well, the best way to solve a Captcha, is not to get one! This week we launched the Luminati Unblocker, the first automated unblocking software to reach your target site and get a 100% success rates.

The Luminati Unblocker allows you to send us a simple request and let us handle the rest. We have already automated all the required optimizations and proxy manipulations just send us your request and we will send you 100% accurate results. How have we accomplished this? We rotate through our multiple networks, manage the headers and cookies, implement country discovery and targeting, detect blocked requests based on response codes, pay attention to content and request time and then match the relevant solution to unblock the request. All of this is done automatically! If you are interested in the Luminati Unblocker, contact your Luminati account manager.

Now I see that we have received a few questions:

  • Will the webinar be recorded? Are you going to share it afterwards?
  • Yes, the webinar and presentation will be available at Luminati Webinars page with our previous webinars
Another question:

  • Does Luminati service resolve Recaptcha V2?
  • No, Luminat does not resolve the Recaptcha instead we offer the Luminati unblocker which automates all necessary proxy manipulations to deliver the results you need.
  • How do I work or integrate with the unblocker?
  • You merely send us a curl request with your target site over the Luminati API, and we send you back the results.
  • Do I need to make any changes on my side for the unblocker?
  • No, you just add a very simple request to your existing Luminati API
  • Do you support the collection of price comparison data?
  • Yes, the unblocker can help to collect pricing data for ecommerce, travel, profile data for account management and majority of web-data-extraction needs.
For more information or to start working with the unblocker today contact your Luminati account manager or emails sales@luminati.io If you have any more questions please write them in the small box at the bottom of the screen and they will be answered after the webinar. Thank you for attending and we hope you enjoyed our Luminati webinar on preventing and solving recaptcha. And Have a lovely day.
Start your free trial now