Introduction To Data Retrieval Using Python – A Beginners Guide

0
94
Data Retrieval using Python

Python is an open source scripting language and includes various modules and libraries for information extraction and retrieval. In this article, we will be discussing the Data Retrieval Using Python and how to get information from APIs that are used to share data between organizations and various companies. The hypertext transfer protocol (HTTP) is defined as an application protocol which is used distributed information systems for communication for the World Wide Web.

Python handles all the HTTP requests and integration with web services seamlessly. When user uses Python language, there is no need to manually add query strings to your URLs, or to form encode POST data. The installation of requests library is managed with following command:

pip install requests

Output

installation request

Web APIs include various sections which are mentioned as follows:

1. Method
2. Base URI
3. Element
4. Media types

The sections of web APIs for specific URL is described as below:

web APIs

The method is a part of HTTP protocol which includes methods such as GET, POST, PUT and DELETE. Usually these verbs do what their meaning implies i.e. getting data, changing data or deleting it. In requests, the return value of HTTP is a response object.

Code Implementation to get a text of the response

Output of the Data Retrieval Using Python, response received is as follows:

Output response

JSON Parsing:

JavaScript Object Notation(JSON) is a notation used to define objects in JavaScript. The python library “requests” has built a JSON parser into its Response object. It can also convert Python dictionaries into JSON strings. Consider for the following example which explains JSON parsing.

It can be parsed with python code as mentioned below:

You can also dump the key value pair into JSON format.

Web Scraping:

Web scarping is a basic practice of using a computer program to sift through a web page and gather the data in systematic format. In other words, web scraping also called as “Web harvesting”, “web data extraction” is the construction of an agent to parse or organize data from the web in an automated manner.
The method of web scraping can be used for the following scenarios like:

1. Retrieving information from a table in wiki page.
2. A user may want to get the listing of reviews for a particular movie to perform text mining. User can even a build predictive model to spot fake reviews.
3. Displaying analysis in visualization manner.
4. Enrich the data set based on the information available on specified website.
5. It sounds very interesting to track trending new stories on a particular topic of interest.

Example:

In this example, we will be focusing on hacking a web page and storing the information in dictionary format. Hacker news is a popular aggregator of news articles which is found interesting by all hackers. For hacking a specific it is important to have three plugins installed in your system which are mentioned below:
1. Requests
2. Re
3. BeautifulSoup

The command for the installation of plugin is:

pip install requests

Explanation:

1. The requests are fetched with respect to values fetched in the URL.
2. With the list of items used for iteration using for loop, it helps us to maintain a systematic approach of creating JSON (key-value) pairs.
3. The dictionary (JSON) value created includes a parameter of link, score, title and comments.

Output:

web scraping concept

The above illustration explains the web scraping concept without reference of any API. Web Scraping is considered beneficial over the use of API due to following constraints:

1. The website where a user is focusing on the information extraction does not provide ant API (Application Programming Interface).
2. The API provided is not free of cost.
3. API provided is limited, which means a user can access them only for specific period of time.
4. API does not expose the data which is needed, but website does.

Conclusion:

Data science research analysts using Python can easily scrap or fetch the information from the given website for the research outputs. Information Retrieval with Python goes through a simple procedure by showing how to handle the cookies and session values. We also focussed on various methods used for information retrieval which can be used in research.

LEAVE A REPLY

Please enter your comment!
Please enter your name here