Data structure for querying if a given interval is enclosed by a set of other intervals

Issue

There are millions of non-overlapping contiguous intervals, like [a, c), [c, e), [e, g)…
They are sent to me in a random order and I would like to query at any time if some other given interval can be enclosed by a combination of those contiguous intervals received.

For instance, I want my_interval_set to have a method add(c) to add one of the contiguous intervals c, a method enclose(i) to test whether an arbitrary interval i can be enclosed by a combination of intervals added before.

Something like

# add [a, c), [c, e)
my_interval_set.add([c,e])
my_interval_set.add([a,c])

my_interval_set.enclose([a,e])  # should return true

Wondering what would be a suitable data structure for this case?

If it makes a difference, we can assume the boundaries of an i will always match the boundaries of some cs, so in the above example there will not a be a query my_interval_set.enclose([b,d]) as b falls within the interval [a,c).

If there has to be trade-off between the efficiency of two operations, enclose() is more important to me.


My attempt is to use insertion sort and merge adjacent intervals (like [a,c)+[c,e] -> [a,e)) as one being added, for processing an enclose query we just scan over the whole list once.

def add(c):
    idx = my_list.insert(c)  # insertion sort the interval c
    my_list.merge_if_adjacent(idx)  # check neighbour elements of c and merge if needed


def enclose(i):
    for interval in my_list:
        if interval.lo <= i.lo and interval.hi >= i.hi:
            return True
    return False

In the worst case both add() and enclose() are O(n) time complexity, which seems inefficient to me as there could millions of contiguous intervals to be added and queried from.


Edit:
based on btilly@’s answer, wrote some pseudocode as a note to myself

# assume a RBTree data structure with following operations
RBTree.insert(x, flag)
RBTree.delete(x)
RBTree.search(x) -> val, flag
RBTree.predecessor(x) -> val, flag
RBTree.successor(x) -> val, flag  # ref: https://baihuqian.github.io/2018-07-30-inorder-successor-in-bst/


def insert(tree, a, b):
    node_a = tree.search(a)
    node_b = tree.search(b)
    if not sanity_check(node_a, node_b):
        return False

    if node_a is None:
        tree.insert(a, START)
    else:
        tree.delete(node_a)

    if node_b is None:
        tree.insert(b, END)
    else:
        tree.delete(node_b)

def enclose(tree, a, b):
    node_a = tree.search(a)
    if node_a is None:
        node_a = tree.predecessor(a)
    node_b = tree.successor(node_a[0])
    if node_b[0] >= b:
        return True
    else:
        return False

def sanity_check(node_a, node_b):
    # node_a can only be None or an END node
    # node_b can only be None or a START node
    # no other nodes in between
    # ...

Solution

Have a balanced binary tree of some sort of endpoints along with whether they are open or closed.

To insert [a, b] your logic will look like this:

let x be the last event before a (if any)
let y be the next event after b (if any)
delete all events between x and y (if any)
if x doesn't exist or is a close:
    insert (a, open) into the tree
if y doesn't exist or is an open:
    insert (b, close) into the tree

Now it may be that inserting an interval will be very expensive (because you are dealing with a long range). But the amortized cost of every interval is O(log(n)) because that’s the cost to do the lookup, the cost to insert if you need to, and the cost to delete later.

The cost of testing enclosure is O(log(n)). Find the last event before or equal to the start of the interval. If it is an open, and the next event is a close at or after the end, then it is enclosed. Else not.

Answered By – btilly

Answer Checked By – Dawn Plyler (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.