Strings
Strings are complicated. ("Anyone who says differently is selling something.")
In Perl, a lot of the complications are hidden away. This is an example of Perl's DWIM approach. For the most part, strings in Perl behave as we expect and we don't really think about it too much.
In Rust, all of the complications of strings are right in our faces. Rust's approach is not DWIM, rather Rust prefers to be explicit about most everything. One consequence of this is that we end up with multiple types for strings!
Let's look at our "hello name" examples again. In Perl, we had
#!/usr/bin/env perl
use v5.28;
use warnings;
my $name = "Tim";
say "Hello, $name!";
while in Rust, we had
fn main() { let name = "Tim"; println!("Hello, {}!", name); }
Again, we see that Rust's let
is kind of like Perl's my
and it's apparently inferring the type of the string literal. Writing it explicitly, we'd have the following
fn main() { let name: &str = "Tim"; println!("Hello, {}!", name); }
So the type of name
is &str
. What is that? It's a string slice, which is not very similar to a Perl string. The static string literal is somewhere in memory and we just get a view into it. Rust has another type called String
which is more akin to a Perl string. A Rust String
is a dynamic chunk of memory holding our string, but to get one from a string literal we have to convert it. One way to do that is like so
fn main() { let name: String = String::from("Tim"); println!("Hello, {}!", name); }
Again, we could let Rust infer the type--- which truly feels redundant here--- and just write
fn main() { let name = String::from("Tim"); println!("Hello, {}!", name); }
but it still seems like a lot of work for just a string. But wait, there's more!
Decoded or not?
A String
in Rust is a sequence of bytes that is guaranteed to be valid UTF-8. And a &str
is a slice that always points to a valid UTF-8 sequence, so it can be used to view into a String
as well as a static string literal. So these are akin to decoded strings in Perl.
In Perl, if we don't decode a string, explicitly or implicitly, then it's just a sequence of arbitrary bytes. The same thing in Rust would be a byte slice.
#![allow(unused_variables)] fn main() { let name = b"Tim"; println!("{:?}", name); }
Running this would produce [84, 105, 109]
, where 84 is the 'T', 105 is the 'i', and 109 is the 'm'. So b"Tim"
contains all of the data to make a string, but it's not really a string yet.
Characters
I guess now is a good time to mention that a character in Rust is not stored in a byte. A char
is a single UTF-32 character, so it takes four bytes. So a string in Rust is not a sequence of characters! A String
is a UTF-8 sequence, but a char
is a UTF-32 value.
Foreign strings
The Rust standard library also contains some string types for dealing with sequences of bytes that do not decode into valid UTF-8, but are still considered strings in other contexts.
For things like path names, we have operating system strings, std::ffi::OSString
and std::ffi::OSStr
. The OSString
is like String
, but it could contain, say, a Windows-1252 string with values that are not valid UTF-8. The OSStr
is analogous to str
, so we usually see it as &OSStr
just as we usually see &str
.
Rust also has types just for going back and forth between C code, std::ffi::CString
and std::ffi::CStr
. In C, strings are null-terminated sequences of bytes. It's not inexpensive to convert those to and from Rust Strings, so we sometimes use Cstring
and &Cstr
instead.