How to correctly set content-length in spark java after gzip header

lepe :

I'm using Spark to serve different kind of contents. "Content-Length" is calculated correctly, but I'm facing an issue when using:

response.header("Content-Encoding", "gzip")

According to their documentation, spark will automatically gzip the content when that header is set... and it does it.

However, the "Content-Length" that I previously calculated is no longer valid and thus I get a 'net::ERR_CONTENT_LENGTH_MISMATCH' error in the browser.

Gzipping it myself, and calculating the resulting size is not possible as spark will compress the output again.

How can I know which is the resulting size after spark compress the output?

More details:

I created a library over Spark which sets automatically such headers, the interesting part looks like (simplified):

if(request.headers("Accept-Encoding")?.contains("gzip")) {
    response.header("Content-Encoding", "gzip")
    // How to get or calculate the resulting size?
    response.header("Content-Length", ???????)
}

The problem is that Spark is not setting automatically the "Content-Length" header, so I'm trying to add it. The calculation is correct (without compressing) until that point, but as Spark is going to compress the output (because it detects "gzip" as encoding), I don't have a reliable way to set it correctly.

The ways I can think of to fix this issue are:

  1. Wait until Spark adds that header automatically (or rolling my own branch).
  2. Find a way to get that size after Spark compressed the output.
  3. Compress it in the same way Spark is doing it so I can calculate the size (but ugly as it will compress the output twice == CPU waste).

My current solution is not to set the Content-Length header when using gzip header (but its not ideal for large-size files as the browser won't know which percentage has already downloaded).

I hope these details add more light into the situation.

K.H. :

Thanks for clarification!

  1. Yeah, for now you are adding it manually, that is what I'd do and keep it that way unless you really need Content-Length for your usecases. Not knowing size is a bit annoying, but not that uncommon.
  2. I am pretty sure this is very hard to do using current spark's internal API. I played around with it yesterday, intercepting OutputStreams with apache commons CountingOutputStream and there is no API to do that without changing the code and there are other problems with it too. Problem also is, that after spark compressed the output, it is very likely it has already been sent flushed and sent back to client, but this header has to be sent before data. You basically have to know this before you are sending data, so this is the hardest way to go.
  3. Yeah way that would be easiest to implement to spark is probably to serve him already prepared compressed data as ByteArray (seems like you re using kotlin) and disable auto-compression. ByteArrayOutputStream is good way to go. That way it's at least compressed only once. There is also that thing about setting Content-Encoding header while forcing spark not to encode, but that is easy patch. Ugly thing about this is, that you have to store whole data in memory + server won't start sending data before this all is pre-calculated so there will be delay between user clicking download and download starting.
  4. If your big files will be used many times, you can pre-calculate their gziped size in advance or on first run and cache that information. That way you can send data directly into stream and you know information at start.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=163207&siteId=1