Memory-Efficient Byte Processing: Streaming for Large Blobs

Hello folks! In this article i’m going to be talking about some tips on how to minimize memory usage (RAM) while dealing with large blobs of data. Whether it be downloading files, reading data from source and writing to destination, etc. I’ll be doing a demo in Go monitoring the memory usage and talking about how streaming the data from source to destination is a better approach. Let’s get started.
Naive Approach
Let’s say we need to download a large file in our application code and save it somewhere on disk.
The naive approach someone would do is the following:
func downloadFile(filepath string, url string) error {
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
data, _ := io.ReadAll(resp.Body) // Whole file in memory btw
printMemStats()
_, err = out.Write(data)
return err
}
This snippet does the following:
Creates a new file for the <to be downloaded file>
Downloads the file
Copies the bytes from the downloaded blob to the created file
I have the function printMemStats that’ll give us info on memory usage
Running this on a 100MB file and monitoring memory we can deduct the following:
Alloc: 117.94 MB (currently in use)
TotalAlloc: 587.91 MB (total allocated memory) (This is the total amount of memory that has been allocated by the program since it started. It includes both the memory that is still in use and the memory that has been released by the garbage collector.)
Sys: 253.92 MB (total memory reserved from the OS)
HeapAlloc: 117.94 MB (heap memory in use)
HeapSys: 247.11 MB (total heap memory reserved from OS to the app)
NumGC: 21 (number of garbage collections)
Looking at the Allocated memory 118MBs were in use and it makes sense because the downloaded file was 100MB alone + some other memory required by Go runtime.
Imagine this file instead being of size 1 GB. Having a single process in the go app honking 1 GB of memory is very bad practice. We can do better let’s now discover the art of streaming data
Streaming Data
So the idea simplified is instead of having the flow look like this

How about we get rid of the red part and go straight to writing to the file!

The write buffer is made internally when calling out.Write(body) Go internally writes to a buffer before flushing to disk to decrease IO calls which are very expensive.
The main goal and what I love the most about this is that it’s just plain creativity! Instead of having to download then transfer we can just skip the whole download part and as the bytes come in we write them to the file having minimal memory overhead. Let’s translate this to our code and check the memory stats after updating.
func downloadFile(filepath string, url string) error {
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
io.Copy(out, resp.Body) // STREAM THE BODY TO FILE
printMemStats()
return err
}
Now io.Copy internally buffers between source and destination

Running the code the memory stats are as follows;
Alloc: 0.549 MB
TotalAlloc: 0.549 MB
Sys: 8.209 MB
HeapAlloc: 0.549 MB
HeapSys: 3.776 MB
Almost a 99% decrease in memory! Not only is it memory efficient it's much faster as well because we skipped a whole step!
Not only is this used for files or downloads, also can be used when passing blobs as function arguments as well. Because they’d copy by value so get duplicated for the function we can pass instead a reader that reads from source to destination and process as you go.
Summary
Most modern open source applications out there use tricks like these to optimize for memory and performance. Skipping buffers and unnecessary overhead operations when they can. It’s the deep understanding of what goes on behind the scenes that helps open gaps for optimzation. When you visualize it as well its much more clearer and gives guidance on what to do.




