There are millions of non-overlapping contiguous intervals, like [a, c), [c, e), [e, g)…
They are sent to me in a random order and I would like to query at any time if some other given interval can be enclosed by a combination of those contiguous intervals received.
For instance, I want
my_interval_set to have a method
add(c) to add one of the contiguous intervals
c, a method
enclose(i) to test whether an arbitrary interval
i can be enclosed by a combination of intervals added before.
# add [a, c), [c, e) my_interval_set.add([c,e]) my_interval_set.add([a,c]) my_interval_set.enclose([a,e]) # should return true
Wondering what would be a suitable data structure for this case?
If it makes a difference, we can assume the boundaries of an
i will always match the boundaries of some
cs, so in the above example there will not a be a query
my_interval_set.enclose([b,d]) as b falls within the interval [a,c).
If there has to be trade-off between the efficiency of two operations,
enclose() is more important to me.
My attempt is to use insertion sort and merge adjacent intervals (like
[a,c)+[c,e] -> [a,e)) as one being added, for processing an enclose query we just scan over the whole list once.
def add(c): idx = my_list.insert(c) # insertion sort the interval c my_list.merge_if_adjacent(idx) # check neighbour elements of c and merge if needed def enclose(i): for interval in my_list: if interval.lo <= i.lo and interval.hi >= i.hi: return True return False
In the worst case both
enclose() are O(n) time complexity, which seems inefficient to me as there could millions of contiguous intervals to be added and queried from.
based on btilly@’s answer, wrote some pseudocode as a note to myself
# assume a RBTree data structure with following operations RBTree.insert(x, flag) RBTree.delete(x) RBTree.search(x) -> val, flag RBTree.predecessor(x) -> val, flag RBTree.successor(x) -> val, flag # ref: https://baihuqian.github.io/2018-07-30-inorder-successor-in-bst/ def insert(tree, a, b): node_a = tree.search(a) node_b = tree.search(b) if not sanity_check(node_a, node_b): return False if node_a is None: tree.insert(a, START) else: tree.delete(node_a) if node_b is None: tree.insert(b, END) else: tree.delete(node_b) def enclose(tree, a, b): node_a = tree.search(a) if node_a is None: node_a = tree.predecessor(a) node_b = tree.successor(node_a) if node_b >= b: return True else: return False def sanity_check(node_a, node_b): # node_a can only be None or an END node # node_b can only be None or a START node # no other nodes in between # ...
Have a balanced binary tree of some sort of endpoints along with whether they are open or closed.
[a, b] your logic will look like this:
let x be the last event before a (if any) let y be the next event after b (if any) delete all events between x and y (if any) if x doesn't exist or is a close: insert (a, open) into the tree if y doesn't exist or is an open: insert (b, close) into the tree
Now it may be that inserting an interval will be very expensive (because you are dealing with a long range). But the amortized cost of every interval is
O(log(n)) because that’s the cost to do the lookup, the cost to insert if you need to, and the cost to delete later.
The cost of testing enclosure is
O(log(n)). Find the last event before or equal to the start of the interval. If it is an open, and the next event is a close at or after the end, then it is enclosed. Else not.
Answered By – btilly
Answer Checked By – Dawn Plyler (AngularFixing Volunteer)