Pandas CustomBusinessHour reduced performance

Issue

I need to reduce the performance of the following operation :

st = time.time()
bh = CustomBusinessHour(start='00:00', end='23:00')
bdates = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq=bh, name='ts', closed=None)
print_statistics(f'{bdates}', 'filter_bp()', 'utils.py', time.time() - st)

which gives has the following output :

DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
               '2024-01-01 02:00:00', '2024-01-01 03:00:00',
               '2024-01-01 04:00:00', '2024-01-01 05:00:00',
               '2024-01-01 06:00:00', '2024-01-01 07:00:00',
               '2024-01-01 08:00:00', '2024-01-01 09:00:00',
               ...
               '2024-12-31 13:00:00', '2024-12-31 14:00:00',
               '2024-12-31 15:00:00', '2024-12-31 16:00:00',
               '2024-12-31 17:00:00', '2024-12-31 18:00:00',
               '2024-12-31 19:00:00', '2024-12-31 20:00:00',
               '2024-12-31 21:00:00', '2024-12-31 22:00:00'],
              dtype='datetime64[ns]', name='ts', length=6026, freq='CBH')                                

Execution time: 0.40148735046386718750 s

If I try just a similar operation without CustomBusinessHour I have a much better execution time :

st = time.time()
test1 = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq='H', name='ts', closed=None)
print_statistics(f'{test1}', 'filter_bp()', 'utils.py', time.time() - st)

which gives has the following output :

DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
               '2024-01-01 02:00:00', '2024-01-01 03:00:00',
               '2024-01-01 04:00:00', '2024-01-01 05:00:00',
               '2024-01-01 06:00:00', '2024-01-01 07:00:00',
               '2024-01-01 08:00:00', '2024-01-01 09:00:00',
               ...
               '2024-12-31 14:00:00', '2024-12-31 15:00:00',
               '2024-12-31 16:00:00', '2024-12-31 17:00:00',
               '2024-12-31 18:00:00', '2024-12-31 19:00:00',
               '2024-12-31 20:00:00', '2024-12-31 21:00:00',
               '2024-12-31 22:00:00', '2024-12-31 23:00:00'],
              dtype='datetime64[ns]', name='ts', length=8784, freq='H')  

Execution time: 0.00157237052917480469 s

So the execution time is 400 times faster, just have to remove those records.

Anyone has any idea how to do it ? Many thanks !

Solution

The solution is the following portion of code :

st = time.time()
test1 = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq='H', name='ts', closed=None)
test2 = pd.DatetimeIndex(list(filter(lambda x: (np.is_busday(np.datetime64(x, 'D'))) and (x.hour != 23), test1)))
print_statistics(f'{test2}', 'filter_bp()', 'utils.py', time.time() - st)

which gives the following output :

DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
               '2024-01-01 02:00:00', '2024-01-01 03:00:00',
               '2024-01-01 04:00:00', '2024-01-01 05:00:00',
               '2024-01-01 06:00:00', '2024-01-01 07:00:00',
               '2024-01-01 08:00:00', '2024-01-01 09:00:00',
               ...
               '2024-12-31 13:00:00', '2024-12-31 14:00:00',
               '2024-12-31 15:00:00', '2024-12-31 16:00:00',
               '2024-12-31 17:00:00', '2024-12-31 18:00:00',
               '2024-12-31 19:00:00', '2024-12-31 20:00:00',
               '2024-12-31 21:00:00', '2024-12-31 22:00:00'],
              dtype='datetime64[ns]', length=6026, freq=None)                                Execution time: 0.13873147964477539062 s

Answered By – R13mus

Answer Checked By – David Goodson (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.