Skip to main content

Command Palette

Search for a command to run...

DataraFlow Week 6: Data Analysis In Python With Pandas and Numpy

Published
2 min read

Following my research proposal, it was time to finally start working on the skills necessary to implement the solutions proposed. Data processing is one of the key methods, and there are several ways of analysing data to achieve a certain goal or solve a specific problem. The most common method is by using Microsoft Excel; however, Python is also equipped with special features that make it easy to visualize and process data which shall be discussed below.

Numerical Python (NumPy)

NumPy is a Python library used for working with arrays. These arrays are preferred to traditional Python lists in data analysis due to their speed and efficiency in access and manipulation. These arrays also come with many sufficient methods and functions that Data Scientists take advantage of to gain insights from their data. NumPy can be installed by running the code below in the terminal, after which it can be imported into the python file.

pip install numpy

Once NumPy is imported, it can be used to create arrays that possess a host of properties, and can come in several dimensions

import numpy as np  
arr = np.array([1, 2, 3, 4, 5])  #1-D
print(arr)

2Darr = np.array([[1, 2, 3], [4, 5, 6]])  #2D
print(2Darr)

Arrays can be sliced, copied, reshaped, flattened, iterated over, joined, among other operations like traditional Python data structures

import numpy as np  
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])  
print(arr.shape) #checking the shape

import numpy as np  
arr = np.array([1, 2, 3, 4, 5])  
x = arr.copy()  #copying an array
y = arr.view()  #viewing an array

arr1 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  
for x in np.nditer(arr1):  #iterating
    print(x)

import numpy as np  
arr1 = np.array([1, 2, 3])  
arr2 = np.array([4, 5, 6])  
arr = np.concatenate((arr1, arr2))  #joining 2 arrays together
print(arr)

import numpy as np  
arr1 = np.array([1, 2, 3])  
arr2 = np.array([4, 5, 6])  
arr = np.stack((arr1, arr2), axis=1)  #stacking 2 arrays on top of each other
print(arr)

Pandas Module In Python

Panndas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool, built on top of the Python programming language.

Loading and Manipulating Data in Pandas

%pip install xlrd

import pandas as pd

data = pd.read_excel('FilePath.xls')
data

More from this blog

DataraFlow Chronicles

13 posts