Rust: File I/O.
In the next part of Mihalis Tsoukalos’ Rust series, he explains how to use file I/O in your Rust programs for such things as processing text files.
Mihalis Tsoukalos continues to get you up to speed with the bullet-proof language covering its file input and output operations.
Hopefully, you’ll be feeling more comfortable with Rust, which means it’s time to deal with a subject that’s very important in systems programming and Operating Systems: file I/O. Rust’s strict programming rules come in handy when dealing with files because it reduces errors and bugs and allows you to concentrate on what really matters—which is the algorithm and the operations that you want to perform. First, we’ll present some simple examples before continuing with more advanced topics, such as processing text files and memory-mapped file I/O. This tutorial will also cover some important Rust topics, including pointers, pattern matching and the usize and isize types. If at any time you feel that you need some help, you should review the previous two Rust tutorials [Coding Academy, p88, LXF211, p84, LXF210] or visit the Rust documentation web page. So, without further ado, let’s get started!
Command-line arguments
The following Rust code, we’ve called arguments.rs, prints all command-line arguments given to a program including the name of the executable: use std::env; fn main() { let mut counter = 0; for argument in env::args() { counter = counter + 1; println!("{}: {}”, counter, argument); } }
As you will see in the rest of the tutorial, you can directly access a specific command-line argument. However, the presented technique is generic, simple and elegant.
A simple example
The following Rust code, called simple.rs, demonstrates how to open a file for reading and creating an empty file for writing which are the two simplest forms of file I/O: use std::env; use std::fs::File; use std::path::Path; use std::error::Error; fn main() { let args: Vec<_> = env::args().collect(); let open = ::std::env::args().nth(1).unwrap(); let write = ::std::env::args().nth(2).unwrap(); let path = Path::new(&open); let display = path.display(); let mut file = match File::open(&path) { Err(why) => panic!("couldn’t open {}: {}”, display, Error::description(&why)), Ok(file) => file, }; let write_path = Path::new(&write); let write_display = path.display(); let mut write_file = match File::create(&write_path) { Err(why) => panic!("couldn’t create {}: {}”, write_display, Error::description(&why)), Ok(write_file) => write_file, }; }
Both filenames are given as command-line arguments to
simple.rs and converted to Path variables. Our example also includes error-handling code, which is necessary because bad things can happen when dealing with files and file permissions. The File::create() method opens a file in writeonly mode. If the file already exists, then its contents will be completely deleted, otherwise an empty file will be created ( seebottomp88tosee simple.rs inaction). There exist many crates that allow you to perform file I/O. The general rule is to use the library functions that you know best.
The presented program will use command-line arguments to get the name of the file you want to copy as well as the filename of the new file. In the Rust code of cp.rs ( seetop
p88), you can see various executions of it including situations where error messages are produced as a result of various error conditions that are caught by Rust. As you can see,
cp.rs uses fs::copy to create the copy of a file.
Pattern matching
We can now go on to learn about pattern matching in Rust as it can be very useful in many situations and especially when you want to process text files and examine their contents. Pattern matching in Rust happens with the help of the ‘match’ keyword, as you briefly saw in error handling code. A match statement must catch all possible values of the variable used so having a default branch at the end is very common. The default branch is defined with the help of the underscore character (‘ _ ’) which is a ‘catch all’. However, in some rare situations, such as when you examine a condition that can be either true of false, a default branch isn’t needed. The Rust code ( below) showcases the use of pattern matching in Rust: match an_age { 0 => “Too young!!”, -1| -2 => “at the age of cheating!”, 1...18 => “Still very young!”, 18...25 => “At the age of studying!”, ready_for_work @ 26...40 => “At the age of working!”, _ if (an_age % 10 == 0) => “at a new decade!”, _ => “Unknown age!” }
As you can see, you can use ranges with the help of an ellipsis (’ … ‘) or multiple values as a list with the help of the ‘ | ’ character. Additionally, you can give a range a name, as it happens in the 26-40 range that has the name ready_for_ work . Pattern matching also works on tuples and other types of variables. Executing the previous code, saved in reg_exp.rs, produces the following output: 25: you are At the age of studying! 26: you are At the age of working! ... 40: you are At the age of working! 41: you are Unknown age! ... 50: you are at a new decade!
As you can see from the output, the order you put the various conditions plays a key role because only the first match gets executed as happened with the age of 40. If you’re interested in regular expressions, then things are a little more complicated because you will have to use a separate create, called regex, to do your job. Regular expressions and pattern matching play a key role in modern programming languages, so you should try as many examples as possible in order to really understand how both concepts work in Rust.
Working with plain text files
This section will teach you how to process text files. The following code, which is part of match_me.rs, shows you a handy technique for iterating over a text file line by line: let input_path = ::std::env::args().nth(1).unwrap();; let mut file = BufReader::new(File::open(&input_path). unwrap()); for line in file.lines() { let myLine = line.unwrap(); println!("{}”, myLine); } The following Rust code demonstrates how to select the lines that contain a given static string that’s provided as a command-line argument to the program: if my_line.contains(&string_to_match) { println!("Found it!"); }
This code shows that sometimes a different technique can be simpler and more efficient than the use of pattern matching. In this case, the contains() function does the work for you much more elegantly by trying to find the given text inside its input. Using a regular expression or pattern matching would have been unnecessarily complicated.
All lines that match the given static text are saved in another file that’s also given as a command line argument to
match_me.rs. You can see the full Rust code on the LXFDVD. The following output shows match_me.rs in action when processing itself: $ ./match_me thread ‘<main>’ panicked at ‘Usage: name input output string’, match_me.rs:13 $ ./match_me match_me.rs outputFile total_lines Lines processed: 41 $ cat outputFile let mut total_lines = 0;
total_lines = total_lines + 1; println!("Lines processed: {}”, total_lines);
Memory mapped file I/O
Memory mapped I/O is a pretty specialised technique that connects a file on disk with a buffer in memory. This is done in such a way that when you get bytes from the buffer, the matching bytes of the file are actually being read or written. One way to use memory mapped file I/O in Rust is with the help of the mmap crate. The following Rust code, which is a part of the memoryMapped Cargo project, illustrates the use of mmap crate: let mmap_opts = &[ MapOption::MapNonStandardFlags(libc::MAP_SHARED), MapOption::MapReadable, MapOption::MapWritable, MapOption::MapFd(f.as_raw_fd()), ]; The libc crate is a Rust library for types and bindings to native C functions. The libc crate not only supports functions but also constants and type definitions. Its main purpose is allowing Rust developers to work with C code on all platforms supported by Rust by providing a straight binding to the actual system functions on the current platform.
You can find more information about the libc crate at https://crates.io/crates/libc. This will show you the definition of the MAP_SHARED constant (defined in https://doc.rust-lang.org/libc/x86_64-unknown-linuxgnu/libc/constant.MAP_SHARED.html), ie: pub const MAP_SHARED: c_int = 0x0001
Unsafe blocks
As shown ( seebottom,p89) in the documentation page for the libc::chmod function for the linux-gnu platform, the c_ char type is a binding to the C char type. Similarly, the c_int type is a binding for the C int type. Executing the memoryMapped project generates a new file and its name is given as a command-line argument: $ cargo run newFile $ ls -l newFile -rw-r--r-- 1 mtsouk mtsouk 262145 Apr 11 09:30 newFile $ head -1 newFile Hello LXF from Mihalis Tsoukalos! $ file newFile newFile: ASCII text
It’s impressive that the generated file is much bigger than the included text! A useful fact regarding memory-mapped file access is that the memory buffer that you create on your program doesn’t occupy any actual memory space because it’s directly associated with the disk file. The biggest advantage you get from using memory mapped file I/O is increased performance, especially when dealing with large files. However, the price you pay for better performance is having to write more lines of relatively complex code.
The last part of the main.rs file contains the following code which is an unsafe block and will, therefore, need some explanation: unsafe { ptr::copy(src_data.as_ptr(), data, src_data.len());
First of all, the sample code shows how to use the unsafe keyword and define an unsafe block. It’s important to understand that the Rust compiler isn’t always able to verify the safeness of your code so wrapping your code using the unsafe keyword tells the compiler not to worry about the code and that you, the developer, are responsible!
As a result, situations such as deadlocks, integer overflow, divide by zero and memory leaks aren’t good in unsafe blocks of code. This type of code blocks should be used for accessing or updating static mutable variables, dereferencing raw pointers and, as in the example ( above), calling unsafe functions. What unsafe functions usually do is trade safety for speed but they expect the programmer to be fully aware of what they are doing.
Module documentation
Now, we’ll show you how to produce documentation for the memoryMapped project. You need the documentation where the Rust code is, eg to add documentation for main.rs, you should edit main.rs. Documentation information begins with either ‘///’ or ‘//!’ and is used by the rustdoc command-line utility. However, the Rust compiler ignores such information because each line that begins with ‘//’ is considered a comment by the compiler. The information that begins with ‘//!’ applies to the entire code file whereas anything beginning with ‘///’ is considered local information.
If you are using Cargo to build your project, then instead of using rustdoc to generate documentation you should execute cargo doc . The output files will be generated inside the ./target/doc directory inside your project directory. For the memoryMapped project, you should start with the ./target/doc/memoryMapped/index.html file (which is picturedabove).
There are four special headers: The example header – that’s used in main.rs – must contain valid Rust code, the panics header, the safety header and the failures header. In order to include Rust code in an example header, you should embed it in triple slashes. Should you wish to include other kinds of code, you should use the following notation: //! ```c //! printf("This is not Rust code!\n"); //! ```
It’s important to use the connect notation because rustdoc won’t be able to check the correctness of your Rust code and make sure that the code isn’t outdated or incorrect! In case of an error in the format of the documentation information, the error messages that will be generated will be similar to the following: src/main.rs:36:8: 36:32 error: expected one of `.`, `;`, `}`, or an operator, found `/// Inline documentation` src/main.rs:36 x * 2 /// Inline documentation
The following documentation code will print the double_ me() string in a different font ( asyoucanseepicturedabove): /// This is a dummy function named `double_me()`.
The approach that Rust takes is actually quite useful because you write your documentation near your code. However, this assumes that you’re willing to write documentation for your code! Rust generates local documentation information for all crates that are used by the project ( asyoucanalsoseepicturedabove).
What you should remember from this section is that the documenting of Rust code begins with either a triple slash ('///'), a double slash ('//') or an exclamation mark.