If I have a factory that consumes natural gas, it’s not the same to buy it for consumption the next day as for the next year. Prices vary depending on the delivery date, and it’s important to have an overview of how they behave to make purchasing decisions.
In Figure 1, we see that, in general, future prices are higher than current prices. This is known as contango.
As they say in Argentina: it takes two to tango.
All very clear, but… how do we process the raw data provided by MIBGAS for a contract comparison visualization?
Data
Each row represents a trading date for a specific contract depending on when the gas will be delivered.
- GDAES_D+1: Next day
- GMAES: Next month
- GQES_Q+1: Next quarter
- GYES_Y+1: Next year
In this tutorial, we work with 2024 data in CSV format, obtained from MIBGAS.
import pandas as pd
df = pd.read_csv('data/MIBGAS_Data_2024.csv', sep=';', skiprows=1)
Questions
- Why filter and rename the columns of the
DataFrame
? - How do we restructure the contract categories into
columns of the
DataFrame
? - What function is used to convert text to dates?
- How to visualize data directly from the
DataFrame
with a function? - What smart technique is applied to fill missing data in time series?
- Why is it important for the time series to be sorted by date?
Methodology
Select and rename columns
We will compare the different gas prices by contract type and trading date.
df = df.filter(regex='Trading|Product|MIBGAS')
df.columns = ['date_trading', 'contract', 'price']
Restructure contracts into columns
To simplify working with the DataFrame
, we will
restructure it so that each column represents a contract and each row a
trading day.
df = df.pivot(index='date_trading', columns='contract', values='price')
Format date column
By default, dates are in string
format. We will convert
it to datetime
so that visualization and analysis functions
take into account the temporal nature of the data.
df.index = pd.to_datetime(df.index, dayfirst=True)
Sort rows by date
It is very important that the data is sorted by date so that visualization is done in the correct order.
df.sort_index(inplace=True)
Comparative column visualization
Since the pandas
DataFrame
is connected to
the matplotlib
library, we can visualize the data directly
using the plot
function.
df.plot();
We observe many irregular jumps due to missing data on some days, probably due to holidays or weekends.
Linear interpolation
We will use linear interpolation to fill these gaps, making the visualization clearer.
df.interpolate(method='linear', inplace=True)
Now we can compare the different gas contracts over time.
If we observe that lighter colors indicate a more distant horizon and are mostly positioned at the top of the graph, we can conclude that, in the long term, gas prices are higher.
Conclusions
Thanks to this tutorial, you now know how to:
- Filter & Rename Columns: Simplify the
DataFrame
to work only with the variables you’re interested in and rename them for easier handling. - Restructure DataFrames: The
pivot
function allows you to restructure categories from one column into independent columns, simplifying data visualization and analysis. - Transform Text to Dates: Using
pd.to_datetime
to convert strings todatetime
objects is essential for efficiently manipulating time series. - Data Visualization: The integration of
Pandas
withMatplotlib
enables direct and effective visualizations from theDataFrame
. - Linear Interpolation: Linear interpolation is a key technique for dealing with missing data, allowing for a continuous representation of time series.
- Sort Time Series: It’s crucial to sort data by date to ensure temporal coherence in analysis and visualization.
Take a step forward and learn to develop algorithms and applications with our digital courses in Udemy.