Data structure for querying if a given interval is enclosed by a set of other intervals

Issue

There are millions of non-overlapping contiguous intervals, like [a, c), [c, e), [e, g)…
They are sent to me in a random order and I would like to query at any time if some other given interval can be enclosed by a combination of those contiguous intervals received.

For instance, I want `my_interval_set` to have a method `add(c)` to add one of the contiguous intervals `c`, a method `enclose(i)` to test whether an arbitrary interval `i` can be enclosed by a combination of intervals added before.

Something like

``````# add [a, c), [c, e)

my_interval_set.enclose([a,e])  # should return true
``````

Wondering what would be a suitable data structure for this case?

If it makes a difference, we can assume the boundaries of an `i` will always match the boundaries of some `c`s, so in the above example there will not a be a query `my_interval_set.enclose([b,d])` as b falls within the interval [a,c).

If there has to be trade-off between the efficiency of two operations, `enclose()` is more important to me.

My attempt is to use insertion sort and merge adjacent intervals (like `[a,c)+[c,e] -> [a,e)`) as one being added, for processing an enclose query we just scan over the whole list once.

``````def add(c):
idx = my_list.insert(c)  # insertion sort the interval c
my_list.merge_if_adjacent(idx)  # check neighbour elements of c and merge if needed

def enclose(i):
for interval in my_list:
if interval.lo <= i.lo and interval.hi >= i.hi:
return True
return False
``````

In the worst case both `add()` and `enclose()` are O(n) time complexity, which seems inefficient to me as there could millions of contiguous intervals to be added and queried from.

Edit:
based on btilly@’s answer, wrote some pseudocode as a note to myself

``````# assume a RBTree data structure with following operations
RBTree.insert(x, flag)
RBTree.delete(x)
RBTree.search(x) -> val, flag
RBTree.predecessor(x) -> val, flag
RBTree.successor(x) -> val, flag  # ref: https://baihuqian.github.io/2018-07-30-inorder-successor-in-bst/

def insert(tree, a, b):
node_a = tree.search(a)
node_b = tree.search(b)
if not sanity_check(node_a, node_b):
return False

if node_a is None:
tree.insert(a, START)
else:
tree.delete(node_a)

if node_b is None:
tree.insert(b, END)
else:
tree.delete(node_b)

def enclose(tree, a, b):
node_a = tree.search(a)
if node_a is None:
node_a = tree.predecessor(a)
node_b = tree.successor(node_a[0])
if node_b[0] >= b:
return True
else:
return False

def sanity_check(node_a, node_b):
# node_a can only be None or an END node
# node_b can only be None or a START node
# no other nodes in between
# ...
``````

Solution

Have a balanced binary tree of some sort of endpoints along with whether they are open or closed.

To insert `[a, b]` your logic will look like this:

``````let x be the last event before a (if any)
let y be the next event after b (if any)
delete all events between x and y (if any)
if x doesn't exist or is a close:
insert (a, open) into the tree
if y doesn't exist or is an open:
insert (b, close) into the tree
``````

Now it may be that inserting an interval will be very expensive (because you are dealing with a long range). But the amortized cost of every interval is `O(log(n))` because that’s the cost to do the lookup, the cost to insert if you need to, and the cost to delete later.

The cost of testing enclosure is `O(log(n))`. Find the last event before or equal to the start of the interval. If it is an open, and the next event is a close at or after the end, then it is enclosed. Else not.