Lesson 2: Web Requests

Lesson 2: Web Requests

HTTP (The Hypertext Transfer Protocol) is a protocol that defines the structure of requesting and transmitting data between clients (visitor) and servers (website) over the internet.

We will discover how to make these requests in a programming environment (Python) which can be a very useful skill for many tasks such as, web parsing,  data science, machine learning, automation etc.

This lesson will be lightweight and focus on the most common and convenient application of requesting data over HTTP using Python’s convenient requests module.

Function : requests.get()

.get() method of requests library will be used to make HTTP requests in this lesson.

Used Where?

For a variety of tasks such as data science, web parsing, data collection, machine learning, data science and automation under many different domains such as : technology, finance, energy, e-commerce, B2B, B2C, logistics, life sciences, research etc.

Syntax do(s)

1) Remember to import the requests module first.

import requests

2) requests.get(“url”) will help you make an HTTP request. 

 

Syntax don't(s)

1) requests.get(“url”) will only return the HTTP code for the requests you make. Don’t forget to add .text in order to read the actual content of the http response. 

Example 1

>>> import requests
>>> f = r”https://www.doctorswithoutborders.org”
>>> data = requests.get(f)
>>> print(data)

<Response [200]>

Response 200 suggests request was successful. However, usually you’ll want to see more than that. In the next example you can see how to use .text method to extract the content of the request.

Example 2

>>> import requests
>>> f = r”https://www.doctorswithoutborders.org”
>>> data = requests.get(f)
>>> data = data.text
>>> print(data)

<!DOCTYPE html>
<html lang=”en” dir=”ltr” prefix=”content: http://purl.org/rss/1.0/modules/content/ dc: http://purl.org/dc/terms/ foaf: http://xmlns…………….

It’s difficult to show here but you will get the source code of the page you’ve requested. Output above demonstrates only a tiny part of an example source code.

Doctors Without Border is an amazing non-profit organization that brings medical services to the people in most need wherever they are in the world. You can check out their website here.

Tips

Usually requests.get() is only one part of the operation. To make meaningful interpretations you will need to apply one or more of the following methods:

1- Regex
2- Beautiful Soup
3- Selenium
4- SQL

Depending on the results and complexity you’re trying to achieve you might get very meaningful results with a very simple regex code piece or you might need to build more complicated solutions.

Advanced Concepts

There are a number of other methods you can use with the requests library.

1) requests.post(): used for sending data to the server for tasks such as: filling forms, uploading files etc.

2) requests.put(): requests from the server to save the current data to a specific location such as a predefined url.

3) requests.head(): requests the header section of the page only.

4) requests.delete(): deletes all current representations of the source.

5) requests.patch(): it can be used to update only a partial data unlike the .put() method which is used for a complete update.

6) requests.options():

7) requests.connect(): establishes a tunnel to the server.

Exercises: na

Next Lesson: Try / Except