Skip to main content

Command Palette

Search for a command to run...

Dataraflow Week 7: The Perfect Holiday

Updated
4 min read

Week 7 at Dataraflow kicked off by diving deeper into Pandas dataframes. This time, the main lesson was centered around the preprocessing and cleaning of data. This stage in the data processing cycle is done to enable the data scientist to draw key insights from the data by eliminating any biases and elements that might prove to be hindrances. Like a butcher who sharpens his tools before slicing meat, data scientists must also learn to clean data, as it is a crucial skill for pinpointing problems and solving them.

Picking The Best Weeks To Go on Holiday

I obtained the weather dataset for Brasilia, the capital of Brazil. The data spanned the months of December and February, which is the designated summer period for countries in the Southern Hemisphere. Weather Underground’s meteorological records proved helpful in a quest to find the best two-week period to embark on a summer holiday. After obtaining the data, I inserted them into a CSV file, which I then analyzed using the Pandas module on Python.

Passing The Dataset Into Python Pandas & Cleaning It

import pandas as pd
from datetime import datetime

brasilia = pd.read_csv('Brasilia_Weather.csv')

brasilia['Date'] = pd.to_datetime(brasilia['Date'])
brasilia.index = brasilia['Date']
  • I passed the dataset into a Pandas DataFrame using the built-in read_csv () method.

  • All the columns were filled and sorted, so there was no need to invoke the .dropna() and .sort_index() methods.

  • I then assigned the datetime64 data type to the ‘Date’ column to ensure ease of access when retrieving dates.

Finding The Perfect Summer Break

  • To find the perfect summer break, there was a need to establish a ranking system for each day based on the prevalent environmental factors: wind speed, temperature, humidity, and precipitation. Each factor carried a point, which summed up to a maximum of 4 points a day. It was a lot more straightforward to compare each day with these scores, thereby determining the highest-scoring days.

  • I restricted the temperature to 72 - 78 degrees Fahrenheit, as that proved to be the sweet spot for frolicking around in the sun, without the heat becoming an inconvenience. Humidity of less than 72% also guarantees quick evaporation of moisture, so that days don’t feel hotter than they actually are. Wind speeds between 3 and 10 km/h are ideal for having fun in the sun without the fear of your vacation outfit being blown away. A precipitation total of 0 inches to ensure rainfall does not halt any outdoor activities or plans.

  • I then created a rolling score, which summed up the scores of the previous 14 days to arrive at 14 consecutive days with the best environmental scores.

temp_score = ((brasilia['Temperature_Avg_F'] >= 72) & (brasilia['Temperature_Avg_F'] <= 78)).astype(int)
humidity_score = (brasilia['Humidity_Avg_%'] < 75).astype(int)
wind_score = ((brasilia['WindSpeed_Avg_mph'] >= 3) & (brasilia['WindSpeed_Avg_mph'] <= 10)).astype(int)
precip_score = (brasilia['Precipitation_Total_in'] == 0).astype(int)

brasilia['comfort_score'] = temp_score + humidity_score + wind_score + precip_score
brasilia['Two_week_score'] = brasilia['comfort_score'].rolling(window=14).sum()

The Best Two-Week Break In Brasilia

  • After iterating through the dataset, the maximum rolling score was obtained, which implied that the previous 14 days prior to that row had the best combination of environmental conditions.

  • I then traced back thirteen days to find the start of this two-week stretch, which led me to discover the best two weeks to go on vacation to Brasilia.

  • The best start date for the summer break was revealed to be Thursday, February 13, 2025, with the end date being Wednesday, February 26, 2025.

  • In addition, the average Temperature, Humidity, Wind Speed, and Total Precipitation for these two weeks were also computed, which amounted to:

    • Average Temperature: 76.2°F

    • Average Humidity: 66.1%

    • Average Wind Speed: 6.3 mph

    • Total Precipitation: 0.00 in

best_end_date = brasilia['Two_week_score'].idxmax()
best_start_date = best_end_date - pd.Timedelta(days=13)

best_period = brasilia.loc[best_start_date:best_end_date]

print("Based on a comfort index score created using temperature, humidity, wind speed, and precipitation:")

print(f"The BEST 14-day period for a holiday is:")
print(f"   Start Date: {best_start_date.strftime('%A, %B %d, %Y')}")
print(f"   End Date:   {best_end_date.strftime('%A, %B %d, %Y')}")

print("\nWeather Averages for this period:")
print(f"Avg. Temperature: {best_period['Temperature_Avg_F'].mean():.1f}°F")
print(f"Avg. Humidity:    {best_period['Humidity_Avg_%'].mean():.1f}%")
print(f"Avg. Wind Speed:  {best_period['WindSpeed_Avg_mph'].mean():.1f} mph")
print(f"Total Precip:     {best_period['Precipitation_Total_in'].sum():.2f} in")

More from this blog

DataraFlow Chronicles

13 posts