Pandas choose a single value from delimited list in column based on date

Issue

Apologies in advance for bad description—I’m struggling to articulate this well. Basically, I have the following data in a Pandas dataframe:

id symbols   symbol_dates
1  ABC       20070103:29991231
2  DEF;GH    20100307:20141215;20141216:29991231
3  IJ;KLM;NO 20040107:20051105;20051106:20180316;20180317:29991231
4  PQ        20080103:20090613
5  RST;UV    20080206:20150603;20150604:29991231
6  WXY       20070103:20130516

In the above, the data in ‘symbols’ and ‘symbol_dates’ are delimited by ‘;’. So for example, for id 2, the symbol for the date range of ‘20141216:29991231’ is ‘GH’.

I want to be able to input a date in Python, and then return a new dataframe that chooses the correct symbol for the inputted date. If the inputted date does not exist in any date range, then drop the row. For example, an input date of ‘20141220’ should result in the following new dataframe:

id symbol
1 ABC
2 GH
3 KLM
5 RST

I’m stuck on where to even begin with this—any ideas would be appreciated.

Solution

# Split on ';':
df.symbols = df.symbols.str.split(';')
df.symbol_dates = df.symbol_dates.str.split(';')

# Explode the columns:
df = df.explode(['symbols', 'symbol_dates'])

# Split on ':' and expand:
df[['date_start', 'date_end']] = df.symbol_dates.str.split(':', expand=True)

# Drop now unneeded column:
df = df.drop('symbol_dates', axis=1)

# Convert to timestamps... you have some invalid ones.
for col in ['date_start', 'date_end']:
    df[col] = pd.to_datetime(df[col], format='%Y%M%d', errors='coerce')

# Fill with infinity:
df = df.fillna(np.inf)
print(df)

Output:

   id symbols date_start             date_end
0   1     ABC 2007-01-03                  inf
1   2     DEF 2010-03-07  2014-01-15 00:12:00
1   2      GH 2014-12-16                  inf
2   3      IJ 2004-01-07  2005-01-05 00:11:00
2   3     KLM 2005-11-06  2018-01-16 00:03:00
2   3      NO 2018-03-17                  inf
3   4      PQ 2008-01-03  2009-01-13 00:06:00
4   5     RST 2008-02-06  2015-01-03 00:06:00
4   5      UV 2015-06-04                  inf
5   6     WXY 2007-01-03  2013-01-16 00:05:00

Answered By – BeRT2me

Answer Checked By – Jay B. (AngularFixing Admin)

Leave a Reply

Your email address will not be published.