What is a user agent?

A user agent (in relation to browsing the web) is a HTTP header type used to let the server know what sort of device someone is using.

For example, if you request google.com with your iPhone running Chrome - your browser will send a HTTP header which identifies your iPhone, and Chrome browser, and likely some more information such as the version number.

Similarly, if you request google.com with your Linux computer running Firefox, or Apple computer running Safari - these browsers will attach a HTTP header, called a user agent, with relevant information.

How is a user agent relevant to web scraping?

The user agent HTTP header can be used to bypass some simple blocking technologies, simulate users and more.

For example, using the Linux command curl:

curl -v google.com

You will see from the above command the request looks something like below.

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.47.0
> Accept: */*

Note the User-Agent header. This is attached by curl to let the target server know the type of application trying to connect. Similarly, with Google Chrome you might see a header such as this:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/123.45 (KHTML, like Gecko) Chrome/12.0.1234.12 Safari/537.36

!!! Some servers / website block requests based on your user agent alone.

Some websites will simply block all requests with a User-Agent which contains curl - assuming it is an automated bot. Therefore, a way to bypass this type of blocking technology would be to simply use a different user agent:

curl -H "User-Agent: Firefox xyz" -v google.com

You can literally use any string in the User-Agent header - so if you are writing a web crawler, you may wish to add a user agent pointing to your own website, or perhaps directing people to an 'opt out' webform.

If you are collecting data and require user simulation, try routing requests through a list of random user agents, such as the ones found below.

A list of random user agents for use in your web scraping bot

Want a useful list of user agents? Checkout this list

Checkout our Web Scraping Service