What are HTTP requests and responses?

Understanding HTTP requests, responses and headers is critical to develop a powerful web scraper

What is a HTTP request?

A HTTP request is what the client (eg a web browser, web scraper bot, app) sends to a server in order to request data (eg load a webpage or send a form's data).

Beyond the initial connection to a server, a HTTP request is a protocol used to communicate between users and servers in order to browse, load, send and receive data from a website or application.

In simplest form, a HTTP request looks something like this:

GET / HTTP/1.1
Host: www.google.com

Generally, HTTP requests also include other data such as User Agents and Cookies.

GET / HTTP/1.1
Host: www.google.com
User-Agent: My Awesome Browser Version 0.001 beta
Cookies: hereismypassword

Often there will be a dozen or so headers, some set by the website, in order to navigate a web or app correctly. Therefore, it is useful to have a look at the HTTP headers being sent and received under the hood in order to build a successful web scraper.

What is a HTTP response?

A HTTP response is what the server sends back to the client, in order to respond to a request - generally this means sending back a website.

A HTTP response in simplest form will look something like this:

HTTP/1.1 200 OK
Content-Type: text/html

Often responses will contain a number of other headers, such as Set-Cookie (to set a cookie after logging in or similar), Cache-Control, Date and more.

It is useful to understand what various HTTP responses mean in order to write a scraper bot which crawls the web correctly. For example, it may need to set cookies in order to access types of data or API endpoints.

Checkout our Web Scraping Service