Using Scrapy with proxies


I’m working currently on a scraping some websites for I used to develop in PHP but when I searched for best scraping / crawling, I found Scrapy (written in Python) is the best.

You can read more about it and how to start here :

I searched a lot for how to use proxies with Scrapy but couldn’t find simple / Straight forward way to do it. All are talking about Middlewares and Request object but not how to use them.

So, here’s the steps to use Scrapy with proxies :

1 – Create a new file called “” and save it in your scrapy project and add the following code to it.

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
import base64

# Start your middleware class
class ProxyMiddleware(object):
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"

        # Use the following lines if your proxy requires authentication
        proxy_user_pass = "USERNAME:PASSWORD"
        # setup basic authentication for the proxy
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

2 – Open your project’s configuration file (./project_name/ and add the following code

    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    'project_name.middlewares.ProxyMiddleware': 100,

Now, your requests should be passed by this proxy. Simple, isn’t it ?

If you want to test it, just create a new spider with the name test, and add the following code

from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request

class TestSpider(CrawlSpider):
    name = "test"
    domain_name = ""
    # The following url is subject to change, you can get the last updated one from here :
    start_urls = [""]

    def parse(self, response):
        open('test.html', 'wb').write(response.body)

Then cat test.html to find the IP.

For cheap / reasonable proxies, try the following websites :

References :!msg/scrapy-users/mX9d05qcZw8/RkjWkqBT-HIJ

About the author

Mahmoud M. Abdel-Fattah



WP-SpamFree by Pole Position Marketing

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Recent Posts

Recent Comments