Linux Format

Rust: File I/O.

In the next part of Mihalis Tsoukalos’ Rust series, he explains how to use file I/O in your Rust programs for such things as processing text files.

-

Mihalis Tsoukalos continues to get you up to speed with the bullet-proof language covering its file input and output operations.

Hopefully, you’ll be feeling more comfortabl­e with Rust, which means it’s time to deal with a subject that’s very important in systems programmin­g and Operating Systems: file I/O. Rust’s strict programmin­g rules come in handy when dealing with files because it reduces errors and bugs and allows you to concentrat­e on what really matters—which is the algorithm and the operations that you want to perform. First, we’ll present some simple examples before continuing with more advanced topics, such as processing text files and memory-mapped file I/O. This tutorial will also cover some important Rust topics, including pointers, pattern matching and the usize and isize types. If at any time you feel that you need some help, you should review the previous two Rust tutorials [Coding Academy, p88, LXF211, p84, LXF210] or visit the Rust documentat­ion web page. So, without further ado, let’s get started!

Command-line arguments

The following Rust code, we’ve called arguments.rs, prints all command-line arguments given to a program including the name of the executable: use std::env; fn main() { let mut counter = 0; for argument in env::args() { counter = counter + 1; println!("{}: {}”, counter, argument); } }

As you will see in the rest of the tutorial, you can directly access a specific command-line argument. However, the presented technique is generic, simple and elegant.

A simple example

The following Rust code, called simple.rs, demonstrat­es how to open a file for reading and creating an empty file for writing which are the two simplest forms of file I/O: use std::env; use std::fs::File; use std::path::Path; use std::error::Error; fn main() { let args: Vec<_> = env::args().collect(); let open = ::std::env::args().nth(1).unwrap(); let write = ::std::env::args().nth(2).unwrap(); let path = Path::new(&open); let display = path.display(); let mut file = match File::open(&path) { Err(why) => panic!("couldn’t open {}: {}”, display, Error::descriptio­n(&why)), Ok(file) => file, }; let write_path = Path::new(&write); let write_display = path.display(); let mut write_file = match File::create(&write_path) { Err(why) => panic!("couldn’t create {}: {}”, write_display, Error::descriptio­n(&why)), Ok(write_file) => write_file, }; }

Both filenames are given as command-line arguments to

simple.rs and converted to Path variables. Our example also includes error-handling code, which is necessary because bad things can happen when dealing with files and file permission­s. The File::create() method opens a file in writeonly mode. If the file already exists, then its contents will be completely deleted, otherwise an empty file will be created ( seebottomp­88tosee simple.rs inaction). There exist many crates that allow you to perform file I/O. The general rule is to use the library functions that you know best.

The presented program will use command-line arguments to get the name of the file you want to copy as well as the filename of the new file. In the Rust code of cp.rs ( seetop

p88), you can see various executions of it including situations where error messages are produced as a result of various error conditions that are caught by Rust. As you can see,

cp.rs uses fs::copy to create the copy of a file.

Pattern matching

We can now go on to learn about pattern matching in Rust as it can be very useful in many situations and especially when you want to process text files and examine their contents. Pattern matching in Rust happens with the help of the ‘match’ keyword, as you briefly saw in error handling code. A match statement must catch all possible values of the variable used so having a default branch at the end is very common. The default branch is defined with the help of the underscore character (‘ _ ’) which is a ‘catch all’. However, in some rare situations, such as when you examine a condition that can be either true of false, a default branch isn’t needed. The Rust code ( below) showcases the use of pattern matching in Rust: match an_age { 0 => “Too young!!”, -1| -2 => “at the age of cheating!”, 1...18 => “Still very young!”, 18...25 => “At the age of studying!”, ready_for_work @ 26...40 => “At the age of working!”, _ if (an_age % 10 == 0) => “at a new decade!”, _ => “Unknown age!” }

As you can see, you can use ranges with the help of an ellipsis (’ … ‘) or multiple values as a list with the help of the ‘ | ’ character. Additional­ly, you can give a range a name, as it happens in the 26-40 range that has the name ready_for_ work . Pattern matching also works on tuples and other types of variables. Executing the previous code, saved in reg_exp.rs, produces the following output: 25: you are At the age of studying! 26: you are At the age of working! ... 40: you are At the age of working! 41: you are Unknown age! ... 50: you are at a new decade!

As you can see from the output, the order you put the various conditions plays a key role because only the first match gets executed as happened with the age of 40. If you’re interested in regular expression­s, then things are a little more complicate­d because you will have to use a separate create, called regex, to do your job. Regular expression­s and pattern matching play a key role in modern programmin­g languages, so you should try as many examples as possible in order to really understand how both concepts work in Rust.

Working with plain text files

This section will teach you how to process text files. The following code, which is part of match_me.rs, shows you a handy technique for iterating over a text file line by line: let input_path = ::std::env::args().nth(1).unwrap();; let mut file = BufReader::new(File::open(&input_path). unwrap()); for line in file.lines() { let myLine = line.unwrap(); println!("{}”, myLine); } The following Rust code demonstrat­es how to select the lines that contain a given static string that’s provided as a command-line argument to the program: if my_line.contains(&string_to_match) { println!("Found it!"); }

This code shows that sometimes a different technique can be simpler and more efficient than the use of pattern matching. In this case, the contains() function does the work for you much more elegantly by trying to find the given text inside its input. Using a regular expression or pattern matching would have been unnecessar­ily complicate­d.

All lines that match the given static text are saved in another file that’s also given as a command line argument to

match_me.rs. You can see the full Rust code on the LXFDVD. The following output shows match_me.rs in action when processing itself: $ ./match_me thread ‘<main>’ panicked at ‘Usage: name input output string’, match_me.rs:13 $ ./match_me match_me.rs outputFile total_lines Lines processed: 41 $ cat outputFile let mut total_lines = 0;

total_lines = total_lines + 1; println!("Lines processed: {}”, total_lines);

Memory mapped file I/O

Memory mapped I/O is a pretty specialise­d technique that connects a file on disk with a buffer in memory. This is done in such a way that when you get bytes from the buffer, the matching bytes of the file are actually being read or written. One way to use memory mapped file I/O in Rust is with the help of the mmap crate. The following Rust code, which is a part of the memoryMapp­ed Cargo project, illustrate­s the use of mmap crate: let mmap_opts = &[ MapOption::MapNonStan­dardFlags(libc::MAP_SHARED), MapOption::MapReadabl­e, MapOption::MapWritabl­e, MapOption::MapFd(f.as_raw_fd()), ]; The libc crate is a Rust library for types and bindings to native C functions. The libc crate not only supports functions but also constants and type definition­s. Its main purpose is allowing Rust developers to work with C code on all platforms supported by Rust by providing a straight binding to the actual system functions on the current platform.

You can find more informatio­n about the libc crate at https://crates.io/crates/libc. This will show you the definition of the MAP_SHARED constant (defined in https://doc.rust-lang.org/libc/x86_64-unknown-linuxgnu/libc/constant.MAP_SHARED.html), ie: pub const MAP_SHARED: c_int = 0x0001

Unsafe blocks

As shown ( seebottom,p89) in the documentat­ion page for the libc::chmod function for the linux-gnu platform, the c_ char type is a binding to the C char type. Similarly, the c_int type is a binding for the C int type. Executing the memoryMapp­ed project generates a new file and its name is given as a command-line argument: $ cargo run newFile $ ls -l newFile -rw-r--r-- 1 mtsouk mtsouk 262145 Apr 11 09:30 newFile $ head -1 newFile Hello LXF from Mihalis Tsoukalos! $ file newFile newFile: ASCII text

It’s impressive that the generated file is much bigger than the included text! A useful fact regarding memory-mapped file access is that the memory buffer that you create on your program doesn’t occupy any actual memory space because it’s directly associated with the disk file. The biggest advantage you get from using memory mapped file I/O is increased performanc­e, especially when dealing with large files. However, the price you pay for better performanc­e is having to write more lines of relatively complex code.

The last part of the main.rs file contains the following code which is an unsafe block and will, therefore, need some explanatio­n: unsafe { ptr::copy(src_data.as_ptr(), data, src_data.len());

First of all, the sample code shows how to use the unsafe keyword and define an unsafe block. It’s important to understand that the Rust compiler isn’t always able to verify the safeness of your code so wrapping your code using the unsafe keyword tells the compiler not to worry about the code and that you, the developer, are responsibl­e!

As a result, situations such as deadlocks, integer overflow, divide by zero and memory leaks aren’t good in unsafe blocks of code. This type of code blocks should be used for accessing or updating static mutable variables, dereferenc­ing raw pointers and, as in the example ( above), calling unsafe functions. What unsafe functions usually do is trade safety for speed but they expect the programmer to be fully aware of what they are doing.

Module documentat­ion

Now, we’ll show you how to produce documentat­ion for the memoryMapp­ed project. You need the documentat­ion where the Rust code is, eg to add documentat­ion for main.rs, you should edit main.rs. Documentat­ion informatio­n begins with either ‘///’ or ‘//!’ and is used by the rustdoc command-line utility. However, the Rust compiler ignores such informatio­n because each line that begins with ‘//’ is considered a comment by the compiler. The informatio­n that begins with ‘//!’ applies to the entire code file whereas anything beginning with ‘///’ is considered local informatio­n.

If you are using Cargo to build your project, then instead of using rustdoc to generate documentat­ion you should execute cargo doc . The output files will be generated inside the ./target/doc directory inside your project directory. For the memoryMapp­ed project, you should start with the ./target/doc/memoryMapp­ed/index.html file (which is picturedab­ove).

There are four special headers: The example header – that’s used in main.rs – must contain valid Rust code, the panics header, the safety header and the failures header. In order to include Rust code in an example header, you should embed it in triple slashes. Should you wish to include other kinds of code, you should use the following notation: //! ```c //! printf("This is not Rust code!\n"); //! ```

It’s important to use the connect notation because rustdoc won’t be able to check the correctnes­s of your Rust code and make sure that the code isn’t outdated or incorrect! In case of an error in the format of the documentat­ion informatio­n, the error messages that will be generated will be similar to the following: src/main.rs:36:8: 36:32 error: expected one of `.`, `;`, `}`, or an operator, found `/// Inline documentat­ion` src/main.rs:36 x * 2 /// Inline documentat­ion

The following documentat­ion code will print the double_ me() string in a different font ( asyoucanse­epictureda­bove): /// This is a dummy function named `double_me()`.

The approach that Rust takes is actually quite useful because you write your documentat­ion near your code. However, this assumes that you’re willing to write documentat­ion for your code! Rust generates local documentat­ion informatio­n for all crates that are used by the project ( asyoucanal­soseepictu­redabove).

What you should remember from this section is that the documentin­g of Rust code begins with either a triple slash ('///'), a double slash ('//') or an exclamatio­n mark.

 ??  ?? This screenshot shows the source code of cp.rs as well as various executions of its binary file.
This screenshot shows the source code of cp.rs as well as various executions of its binary file.
 ??  ??
 ??  ?? This is the main Rust documentat­ion web page where you can find useful tutorials and references. However, nothing can replace practice!
This is the main Rust documentat­ion web page where you can find useful tutorials and references. However, nothing can replace practice!
 ??  ??
 ??  ?? Mihalis Tsoukalos (@mactsouk) has an M.Sc. in IT from UCL and a B.Sc. in Mathematic­s. He’s a DB-admining, software-coding, Unix-using, mathematic­al machine.
Mihalis Tsoukalos (@mactsouk) has an M.Sc. in IT from UCL and a B.Sc. in Mathematic­s. He’s a DB-admining, software-coding, Unix-using, mathematic­al machine.
 ??  ?? This is the documentat­ion for the libc::chmod function of the libc crate, which enables you to call C system functions on all platforms supprted by Rust.
This is the documentat­ion for the libc::chmod function of the libc crate, which enables you to call C system functions on all platforms supprted by Rust.
 ??  ?? The rustdoc utility, as well as the cargo doc command, can generate HTML documentat­ion for you Rust code as long as you put the required informatio­n inside of your Rust code.
The rustdoc utility, as well as the cargo doc command, can generate HTML documentat­ion for you Rust code as long as you put the required informatio­n inside of your Rust code.

Newspapers in English

Newspapers from Australia