Strings

Strings are complicated. ("Anyone who says differently is selling something.")

In Perl, a lot of the complications are hidden away. This is an example of Perl's DWIM approach. For the most part, strings in Perl behave as we expect and we don't really think about it too much.

In Rust, all of the complications of strings are right in our faces. Rust's approach is not DWIM, rather Rust prefers to be explicit about most everything. One consequence of this is that we end up with multiple types for strings!

Let's look at our "hello name" examples again. In Perl, we had

#!/usr/bin/env perl

use v5.28;
use warnings;

my $name = "Tim";

say "Hello, $name!";

while in Rust, we had

fn main() {
    let name = "Tim";

    println!("Hello, {}!", name);
}

Again, we see that Rust's let is kind of like Perl's my and it's apparently inferring the type of the string literal. Writing it explicitly, we'd have the following

fn main() {
    let name: &str = "Tim";

    println!("Hello, {}!", name);
}

So the type of name is &str. What is that? It's a string slice, which is not very similar to a Perl string. The static string literal is somewhere in memory and we just get a view into it. Rust has another type called String which is more akin to a Perl string. A Rust String is a dynamic chunk of memory holding our string, but to get one from a string literal we have to convert it. One way to do that is like so

fn main() {
    let name: String = String::from("Tim");

    println!("Hello, {}!", name);
}

Again, we could let Rust infer the type--- which truly feels redundant here--- and just write

fn main() {
    let name = String::from("Tim");

    println!("Hello, {}!", name);
}

but it still seems like a lot of work for just a string. But wait, there's more!

Decoded or not?

A String in Rust is a sequence of bytes that is guaranteed to be valid UTF-8. And a &str is a slice that always points to a valid UTF-8 sequence, so it can be used to view into a String as well as a static string literal. So these are akin to decoded strings in Perl.

In Perl, if we don't decode a string, explicitly or implicitly, then it's just a sequence of arbitrary bytes. The same thing in Rust would be a byte slice.


#![allow(unused_variables)]
fn main() {
    let name = b"Tim";
    println!("{:?}", name);
}

Running this would produce [84, 105, 109], where 84 is the 'T', 105 is the 'i', and 109 is the 'm'. So b"Tim" contains all of the data to make a string, but it's not really a string yet.

Characters

I guess now is a good time to mention that a character in Rust is not stored in a byte. A char is a single UTF-32 character, so it takes four bytes. So a string in Rust is not a sequence of characters! A String is a UTF-8 sequence, but a char is a UTF-32 value.

Foreign strings

The Rust standard library also contains some string types for dealing with sequences of bytes that do not decode into valid UTF-8, but are still considered strings in other contexts.

For things like path names, we have operating system strings, std::ffi::OSString and std::ffi::OSStr. The OSString is like String, but it could contain, say, a Windows-1252 string with values that are not valid UTF-8. The OSStr is analogous to str, so we usually see it as &OSStr just as we usually see &str.

Rust also has types just for going back and forth between C code, std::ffi::CString and std::ffi::CStr. In C, strings are null-terminated sequences of bytes. It's not inexpensive to convert those to and from Rust Strings, so we sometimes use Cstring and &Cstr instead.