Find all date ranges for overlapping start and end dates in R

Issue

I have a data frame that looks like this:

w<-read.table(header=TRUE,text="
start.date   end.date
2006-06-26 2006-07-24
2006-07-19 2006-08-16
2007-06-09 2007-07-07
2007-06-24 2007-07-22
2007-07-03 2007-07-31
2007-08-04 2007-09-01
2007-08-07 2007-09-04
2007-09-05 2007-10-03
2007-09-14 2007-10-12
2007-10-19 2007-11-16
2007-11-17 2007-12-15
2008-06-18 2008-07-16
2008-06-28 2008-07-26
2008-07-11 2008-08-08
2008-07-23 2008-08-20")

I’m trying to get an output that will combine overlapping start and end dates into one date range. So for the example set, I’d like to get:

w<-read.table(header=TRUE,text="
start.date   end.date
2006-06-26 2006-08-16
2007-06-09 2007-07-31
2007-08-04 2007-09-04
2007-09-05 2007-10-12
2007-10-19 2007-11-16
2007-11-17 2007-12-15
2008-06-18 2008-08-20")

The question is similar to Date roll-up in R, but I don’t need to do any sort of group by on mine, so the answer there is confusing.

Also, the code that was suggested in response to my question below does not work for certain parts of my data frame such as:

x<-read.table(header=TRUE,text="start.date   end.date
2006-01-19 2006-01-20
2006-01-25 2006-01-29
2006-02-24 2006-02-25
2006-03-15 2006-03-22
2006-04-29 2006-04-30
2006-05-24 2006-05-25
2006-06-26 2006-08-16
2006-07-05 2006-07-10
2006-07-12 2006-07-21
2006-08-13 2006-08-15
2006-08-18 2006-08-19
2006-08-28 2006-09-02")

I am confused why it does not?

Solution

Try this:

w[] <- lapply(w, function(x) as.Date(x, '%Y-%m-%d'))
w <- w[order(w$start.date),] # sort the data by start dates if already not sorted
w$group <- 1:nrow(w) # common intervals should belong to same group
merge.indices <- lapply(2:nrow(w), function(x) {
                    indices <- which(findInterval(w$end.date[1:(x-1)], w$start.date[x])==1)
                    if (length(indices) > 0) indices <- c(indices, x) 
                    indices})
# assign the intervals the right groups
for (i in 1:length(merge.indices)) {
  if (length(merge.indices[[i]]) > 0) {
    w$group[merge.indices[[i]]] <- min(w$group[merge.indices[[i]]])
  }
}

do.call(rbind, lapply(split(w, w$group), function(x) data.frame(start.date=min(x[,1]), end.date=max(x[,2]))))

It conceptually merges overlapping intervals into the same group as shown below:
enter image description here

with output:

   start.date   end.date
1  2006-01-19 2006-01-20
2  2006-01-25 2006-01-29
3  2006-02-24 2006-02-25
4  2006-03-15 2006-03-22
5  2006-04-29 2006-04-30
6  2006-05-24 2006-05-25
7  2006-06-26 2006-08-16
11 2006-08-18 2006-08-19
12 2006-08-28 2006-09-02

Answered By – Sandipan Dey

Answer Checked By – Candace Johnson (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.