Interpolate quarterly data into monthly python

I have some quarterly data that I need to convert to monthly in order to work with another data set. The data looks like this:

Date Value
1/1/2010 100
4/1/2010 130
7/1/2010 160

What I need to do is impute the values for the missing months so that it looks like this:

Date Value
1/1/2010 100
2/1/2010 110
3/1/2010 120
4/1/2010 130
5/1/2010 140
6/1/2010 150
7/1/2010 160

Couldn't find many previous questions on how to do this. Only the reverse [monthly to quarterly]. I tried one of those methodologies in reverse, but it didn't work:

pd.PeriodIndex[df.Date, freq='M']

What would be the easiest way to go about doing this in Pandas?

Time Series Interpolation for Pandas: Eating Bamboo Now — Eating Bamboo Later [Photo by Jonathan Meyer on Unsplash]

Note: Pandas version 0.20.1 [May 2017] changed the grouping API. This post reflects the functionality of the updated version.

Anyone working with data knows that real-world data is often patchy and cleaning it takes up a considerable amount of your time [80/20 rule anyone?]. Having recently moved from Pandas to Pyspark, I was used to the conveniences that Pandas offers and that Pyspark sometimes lacks due to its distributed nature. One of the features I have learned to particularly appreciate is the straight-forward way of interpolating [or in-filling] time series data, which Pandas provides. This post is meant to demonstrate this capability in a straight forward and easily understandable way using the example of sensor read data collected in a set of houses. The full notebook for this post can be found in my GitHub.

Preparing the Data and Initial Visualization

First, we generate a pandas data frame df0 with some test data. We create a mock data set containing two houses and use a sin and a cos function to generate some sensor read data for a set of dates. To generate the missing values, we randomly drop half of the entries.

data = {'datetime' : pd.date_range[start='1/15/2018',
                                  end='02/14/2018', 
                                  freq='D']\
                     .append[pd.date_range[start='1/15/2018',
                                           end='02/14/2018',
                                           freq='D']],
        'house' : ['house1' for i in range[31]] 
                  + ['house2' for i in range[31]],
        'readvalue' : [0.5 + 0.5*np.sin[2*np.pi/30*i] 
                       for i in range[31]]\
                     + [0.5 + 0.5*np.cos[2*np.pi/30*i] 
                       for i in range[31]]}df0 = pd.DataFrame[data, columns = ['readdatetime', 
                                    'house', 
                                    'readvalue']]# Randomly drop half the reads
random.seed[42]
df0 = df0.drop[random.sample[range[df0.shape[0]],
                             k=int[df0.shape[0]/2]]]

This is how the resulting table looks like:

Raw read data with missing values

The plot below shows the generated data: A sin and a cos function, both with plenty of missing data points.

We will now look at three different methods of interpolating the missing read values: forward-filling, backward-filling and interpolating. Remember that it is crucial to choose the adequate interpolation method for each task. Special considerations are required particularly for forecasting tasks, where we need to consider if we will have the data for the interpolation when we do the forecasting. For example, if you need to interpolate data to forecast the weather then you cannot interpolate the weather of today using the weather of tomorrow since it is still unknown [logical, isn’t it?].

Interpolation

To interpolate the data, we can make use of the groupby[]-function followed by resample[]. However, first we need to convert the read dates to datetime format and set them as the index of our dataframe:

df = df0.copy[]
df['datetime'] = pd.to_datetime[df['datetime']]
df.index = df['datetime']
del df['datetime']

Since we want to interpolate for each house separately, we need to group our data by ‘house’ before we can use the resample[] function with the option ‘D’ to resample the data to a daily frequency.

The next step is then to use mean-filling, forward-filling or backward-filling to determine how the newly generated grid is supposed to be filled.

mean[]

Since we are strictly upsampling, using the mean[] method, all missing read values are filled with NaNs:

df.groupby['house'].resample['D'].mean[].head[4]

Filling using mean[]

pad[] — forward filling

Using pad[] instead of mean[] forward-fills the NaNs.

df_pad = df.groupby['house']\
            .resample['D']\
            .pad[]\
            .drop['house', axis=1]
df_pad.head[4]

Filling using pad[]

bfill[] — backward filling

Using bfill[] instead of mean[] backward-fills the NaNs:

df_bfill = df.groupby['house']\
            .resample['D']\
            .bfill[]\
            .drop['house', axis=1]df_bfill.head[4]

Filling using bfill[]

interpolate[] — interpolating

If we want to mean interpolate the missing values, we need to do this in two steps. First, we generate the underlying data grid by using mean[]. This generates the grid with NaNs as values. Afterwards, we fill the NaNs with interpolated values by calling the interpolate[] method on the read value column:

df_interpol = df.groupby['house']\
                .resample['D']\
                .mean[]
df_interpol['readvalue'] = df_interpol['readvalue'].interpolate[]
df_interpol.head[4]

Filling using interpolate[]

Visualizing the Results

Finally, we can visualize the three different filling methods to get a better idea of their results. The opaque dots show the raw data, the transparent dots show the interpolated values.

We can see how in the top figure, the gaps have been filled with the previously known value, in the middle figure, the gaps have been filled with the existing value to come and in the bottom figure, the difference has been linearly interpolated. Note the edges in the interpolated lines due to the linearity of the interpolation process. Depending on the task, we could use higher-order methods to avoid these kinks, but this would be going too far for this post.

Original data [dark] and interpolated data [light], interpolated using [top] forward filling, [middle] backward filling and [bottom] interpolation.

Summary

In this post we have seen how we can use Python’s Pandas module to interpolate time series data using either backfill, forward fill or interpolation methods.

How do you convert quarterly data to monthly data in Python?

Create a for loop that converts the qrt in the quarters list into months with the formula round[qrt/3, 2] . The round[] function with the second argument 2 rounds the numbers down to a readable format.

How is monthly calculated in pandas?

I want to calculate the average values per month. So the pseudocode for each month is as follows: Sum all the values for each day present in that month. Divide by the number of days with data for that month.

How do you resample time series data?

Resample time-series data..

Convenience method for frequency conversion and resampling of time series. ... .

Upsample the series into 30 second bins and fill the NaN values using the ffill method..

Upsample the series into 30 second bins and fill the NaN values using the bfill method..

How do you resample data in Python?

Resample Hourly Data to Daily Data resample[] method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample[] method together with . sum[] .