Working with temporal properties using pandas Datetime Index

Leverage the properties of DatetimeIndex in Pandas for more efficient time series analysis, from formatting the column to creating reports with pivot tables.

The electricity generated by a photovoltaic solar plant varies significantly according to the hour and month.

Pivot table of solar energy generation by hour and month at a photovoltaic plant during the year 2023.
F1. Energy generation by hour and month in 2023

The energy generation at 18:00 is not the same as at 21:00, nor in January as in June.

Data

The raw data represent solar energy generation at a photovoltaic plant over the course of the year 2023.

import pandas as pd

df = pd.read_csv('data.csv')
Raw data of solar energy generation at a photovoltaic plant throughout the year 2023.
F2. Raw solar energy data throughout 2023

Questions

To create the report from the raw data, we need to answer the following questions:

  1. How to format the temporal column with pandas?
  2. Why is it useful to set the temporal column as an index?
  3. How to create independent temporal columns?
  4. How to aggregate data based on hour and month?
  5. How to filter data relevant for analysis?
  6. What technique highlights the variation in energy generation throughout the day?

Methodology

Formatting Temporal Column

The temporal column must be recognized as a datetime object to work with the temporal properties offered by pandas.

For this, we use the pd.to_datetime function.

df['datetime'] = pd.to_datetime(df['datetime'])
df.dtypes
DataFrame data types after converting the temporal column to DateTime type in Pandas.
F3. Temporal column as datetime for analysis

Temporal Column to Index

If our data table has a temporal column with unique values, it’s good practice to set it as an index.

df.set_index('datetime', inplace=True)
Pandas DataFrame with the temporal column set as an index to facilitate temporal operations.
F4. Using temporal column as index

Creating Independent Temporal Columns

We would like to aggregate the total energy generation by month and hour. To do this, we create two new columns: month and hour.

Would this be interesting to one of your friends? Share it with them.

df = (df
 .assign(
     month = df.index.month,
     hour = df.index.hour,
    )
)
Pandas DataFrame showing the new month and hour columns derived from the temporal index.
F5. Month and hour columns for detailed analysis

Aggregating Data by Hour and Month

Having new columns that can be interpreted as categorical, we use the pivot_table function to create a report table representing the total energy generation by month and hour.

dfr = (df
 .pivot_table(
     index='month', columns='hour',
     values='energy', aggfunc='sum'
    )
)
Pivot table in Pandas showing total solar energy generation by month and hour.
F6. Energy generation analysis by month and hour

Locating Relevant Data

As expected, solar generation produces 0 energy during the night.

To clean up the table, we select with loc the hours from 6:00 to 21:00.

dfr.loc[:, 6:21]

Heat Matrix Analysis

The report is quite bland; it needs some salt and a bit of spice.

We add it following this recipe et voilà.

Heat matrix representing solar energy generation by month and hour, highlighting generation peaks
F7. Heat matrix highlighting energy variation

It’s curious that, just before the summer starts, in full June, the generation does not continue the trend from spring. What happened?

Looking forward to reading your comments.

Conclusions

In conclusion, thanks to this tutorial we have answered the questions posed at the beginning:

  1. Formatting the temporal column: pd.to_datetime converts the specified column to date and time format.
  2. Setting the temporal column as an index: set_index facilitates temporal operations by making the time column the DataFrame index.
  3. Creating independent temporal columns: DatetimeIndex allows extracting independent temporal properties.
  4. Aggregating data based on hour and month: pivot_table organizes data by month and hour, useful for aggregate analysis.
  5. Filtering data relevant for analysis: loc selects segments of the DataFrame.
  6. Highlighting the variation in energy generation throughout the day: pivot_table combined with color gradients highlights temporal patterns.

If you could program whatever you wanted, what would it be?

I might give you a hand by creating tutorials that help you. I’ll read you in the comments.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to datons.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.