DataraFlow Week 6: Data Analysis In Python With Pandas and Numpy
Following my research proposal, it was time to finally start working on the skills necessary to implement the solutions proposed. Data processing is one of the key methods, and there are several ways of analysing data to achieve a certain goal or solve a specific problem. The most common method is by using Microsoft Excel; however, Python is also equipped with special features that make it easy to visualize and process data which shall be discussed below.
Numerical Python (NumPy)
NumPy is a Python library used for working with arrays. These arrays are preferred to traditional Python lists in data analysis due to their speed and efficiency in access and manipulation. These arrays also come with many sufficient methods and functions that Data Scientists take advantage of to gain insights from their data. NumPy can be installed by running the code below in the terminal, after which it can be imported into the python file.
pip install numpy
Once NumPy is imported, it can be used to create arrays that possess a host of properties, and can come in several dimensions
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) #1-D
print(arr)
2Darr = np.array([[1, 2, 3], [4, 5, 6]]) #2D
print(2Darr)
Arrays can be sliced, copied, reshaped, flattened, iterated over, joined, among other operations like traditional Python data structures
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape) #checking the shape
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy() #copying an array
y = arr.view() #viewing an array
arr1 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
for x in np.nditer(arr1): #iterating
print(x)
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2)) #joining 2 arrays together
print(arr)
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1) #stacking 2 arrays on top of each other
print(arr)
Pandas Module In Python
Panndas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool, built on top of the Python programming language.
Loading and Manipulating Data in Pandas
%pip install xlrd
import pandas as pd
data = pd.read_excel('FilePath.xls')
data