DataraFlow Week 9: Grouping Data To Draw Out Specific Insights
Last week, we learnt how to combine two datasets containing a common column to create a single bigger dataset which is is easier to analyze. This week, we go even deeper by learning how to separate a large dataset into distinct groups or subsets of data to make analysis a bit easier.
As a way of walking through my lessons for the week, I will be analyzing a dataset from Comtrade which contains information on Nigeria’s importation and exportation of palm oil grouped into two categories: crude or anything other than crude.
Loading In The Dataset & Cleaning It
- The dataset was downloaded from the UN Comtrade website as a CSV file, which was loaded into a dataframe with the Pandas module. In order to make the data easier to analyze and present, irrelevant columns were dropped to create a subset of the original column that contained only key data. The selected columns were also renamed to make the data more descriptive to the average person.
import pandas as pd
file = 'nigeria_palmoil_monthly_2023.csv'
palmOil = pd.read_csv(file, encoding='ISO-8859-1')
def toMillions(x):
x /= 1000000
return x
def oilType(code):
if code == 151110:
return 'crude'
if code == 151190:
return 'not crude'
return 'unknown'
palmOil['fobvalue (millions)'] = palmOil['fobvalue'].apply(toMillions)
COMM_CODE = 'Commodity Code'
COMMODITY = 'Palm Oil'
REPORTER = 'Reporter'
PARTNER = 'Partner'
DATE, YEAR, MONTH = 'Date', 'Year', 'Month'
palmOil[COMM_CODE], palmOil[COMMODITY] = palmOil['isOriginalClassification'], palmOil['cmdCode']
palmOil[REPORTER], palmOil[PARTNER] = palmOil['reporterISO'], palmOil['partnerISO']
palmOil[DATE], palmOil[YEAR], palmOil[MONTH] = palmOil['freqCode'], palmOil['refPeriodId'], palmOil['refMonth']
palmOil[COMMODITY] = palmOil[COMM_CODE].apply(oilType)
COLUMNS = [DATE, YEAR, MONTH, REPORTER, PARTNER, 'flowCode', COMM_CODE, COMMODITY,'fobvalue (millions)']
palmOil = palmOil[COLUMNS]
- The trades featuring the ‘World’ partner was omitted to focus on international relationships between individual countries. After which, the dataframe was sorted in descending order of trade value to highlight the biggest contributors to palm oil trade in Nigeria.
palmOil_countries = palmOil[(palmOil[PARTNER] != 'World') & (palmOil[COMM_CODE] != 1511)]
palmOil_countries = palmOil_countries.sort_values('fobvalue (millions)', ascending=False)
An Analysis Of Nigeria’s Palm Oil Trade (Based on 2023 Data)
By sorting the import data by value, I identified the dominant supplier.
# Group by Partner and Year, filter for 2023
imports = palmOilCountry[palmOilCountry['flowCode'] == 'Import']
top_sources = imports.sort_values('fobvalue (millions)', ascending=False)
# Visualization
top_sources.head(10)
This confirmed Malaysia as the primary source of Nigeria's palm oil imports for 2023.