CSES Range Query Question: Salary Queries
You have the right idea in using index compression - great thinking! As N
is only up to 200000
, keeping an occurrence array will at most require 200000
elements for the initial values of the given array, instead of 10^9
array indices.
According to yourself, the problem you face is when you encounter new values during processing queries. You're right; this would throw the occurrence array off-balance and cause updates to have to run in O(N)
time. The solution to this problem is just a tiny modification to your current method.
To solve the problem of encountering new values, we can just make sure that we'll never encounter any new values. We can do this by reading in all the queries before constructing the sum segment tree. This will result in a maximum of N + 2*Q
unique values, or 600000
in the worst case, which is far enough to construct an occurrence array with the problem's 512MB storage limit. After that, a sum segment tree will be able to answer these queries efficiently.
So in the end, a strategy to solve this problem would be to input every unique number, then construct an index compression function, then use a sum segment tree to efficiently process sum queries.
In the future, remember that in these sorts of query-answering questions, it might be useful to read in ALL the input before precomputation. Good luck with your program.
First, consider the naive: For each update, update the array. For each query, scan through the entire array and collect your answer. The complexity of this solution has O(n)
updates, O(n)
queries. No good.
We can come up with a different solution with arguably worse time complexity, but it gives us a hint as to what our end result is. Maintain the source array at all times, but also keep a hash map of value->frequency. Then, when you update, decrement the frequency at the old value and increment it at the new value. Now, for queries, loop through all values of that query range and sum them for your answer. This results in O(1)
updates and O(r-l)
queries, so we have excellent updates but awful queries. However, this result can be improved upon if we can just speed up those queries! Enter the Segment Tree.
Traditionally, you'd construct a segment tree all the way down to its leaves upon creation. However, we'd nominally like a segment tree that ranges from 0-10^9
, so there's absolutely no way that we could generate that much memory (and we'd time out in doing so). However, what if we create a segment tree, but for every node, its children are implicit if they've never been used. That is, don't create child nodes if there aren't elements in them. This structure is named, aptly, the Implicit Segment Tree. The idea here is implement your segment tree as normal except skip the part in the constructor where you initialize your left and right children. Now, when you need to delve into your children due to a partial range query, check if they exist, and if they don't, create them. Otherwise, since you've never needed to make them, assume the sum of the values in those nodes is 0!
The final solution is as follows: Create a segment tree of the max value queryable (if you don't have to answer interactively, consider saving and scanning your queries to find the max r-value, but you don't have to). Note to make this an implicit segment tree. Maintain the source array after every update, and also do point-updates on your tree which will be O(log(max value))
. Queries are regular segment tree range queries, so these will be O(log(max value))
. And there it is!