Buffered IO Streams In Ruby
Object Permanence 🔗
We have some really important information in our console.
> data = "some really important information"
=> "some really important information"
We want to store that information to disk, but only temporarily. We’ll do so
using a Tempfile
, which is built in to Ruby, but must be required to be used.
> require "tempfile"
> file = Tempfile.new
This creates a new temporary file on disk, and as it’s new, it is currently empty, which we’ll check two different ways.
> File.size(file.path)
=> 0
> file.size
=> 0
Finally we can write some really important information to disk, after which we’ll check the size again.
> file.write(data)
=> 33
> File.size(file.path)
=> 0
> file.size
=> 33
When we write to the
IO stream (in this case a file), we get the number of bytes written, 33,
returned. After writing to the file, we asked for the size of the file with
File.size
and got 0. Then, we asked the file for its size and got 33. What
happened?
Where’s that string? 🔗
Maybe the string was written to the file instance in memory before being committed to disk. Let’s look at the size of our objects.
> require "objspace"
> ObjectSpace.memsize_of(data)
=> 40
> ObjectSpace.memsize_of(file)
=> 80
Now, memsize_of
is a hint/guess - and the docs are clear about that:
Note that the return size is incomplete. You need to deal with this information as only a HINT.
That’ll work for us; we’ll just use it to see if it changed at all.
Let’s try to write again, now that we’ve seen the file is currently 80 bytes in memory.
> file.write(data)
=> 33
> ObjectSpace.memsize_of(file)
=> 80
The size of the file
object itself didn’t change, so I guess it’s not hiding
in there.
As we saw previously, passing the path to File.size
doesn’t show the
newly-written bytes being written to the file, but asking the file
instance
itself for its size
does.
Also, after asking for file.size
, File.size(file.path)
does have the
size including the newly-written bytes. So they do eventually agree on the
file’s size.
> File.size(file.path)
=> 33
> file.size
=> 66
> File.size(file.path)
=> 66
Sizing Up The Difference 🔗
Calling size
on the file instance has a documented side effect.
As a side effect, the IO buffer is flushed before determining the size.
That explains where our string went after writing it! It was stored in Ruby’s IO buffer. Flushing the buffer pushes its contents to the operating system.
Let’s observe that by checking the size of the file, writing more bytes to it, checking the size of the file doesn’t change, and explicitly flushing the buffer.
After flushing the buffer, the size of the file does change by the number of bytes written.
> File.size(file.path)
=> 66
> file.write(data)
=> 33
> File.size(file.path)
=> 66
> file.flush
> File.size(file.path)
=> 99
Rewinding the file after writing it also appears to flush the buffer.
> File.size(file.path)
=> 99
> file.write(data)
=> 33
> File.size(file.path)
=> 99
> file.rewind
=> 0
> File.size(file.path)
=> 132
No Buffering 🔗
We can bypass Ruby’s IO buffer by setting the stream’s sync mode. By default, this is set to buffer; however, setting it to true will immediately flush the stream contents to the operating system.
> new_file = Tempfile.new
> new_file.sync
=> false
> new_file.sync = true
> streaming = "no buffering"
> File.size(new_file.path)
=> 0
> new_file.write(streaming)
=> 12
> File.size(new_file.path)
=> 12
File.size
is recognizing the bytes in the file without needing to flush the
buffer, either directly or via a method that does so as a side effect. The sync
mode is pushing whatever we write directly to disk (at least, through the
operating system).
Closing Our Stream 🔗
Ruby will buffer writes in an IO stream, such as a file, and you need to be mindful of when or if that buffer is flushed should you then immediately check the impact that writing to a stream had on the item being written to.
This post originally published on The Gnar Company blog.