CommonLounge Archive

Python3 Strings

July 26, 2018

We already learnt a little bit about strings in Python3 Basics: Strings and Functions, but in this tutorial we are going to go a little deeper and also see some helpful functions Python provides for strings.

String literals

String literals can be enclosed by either single or double quotes, although single quotes are more commonly used. For example, 'Hello World' or "Hello World".

Escaping

To type in some special characters, you have to type in a backslash escape. For example, \' is a single quote, \" is a double quote, \\ is a backslash, \t is a tab and \n is a newline. For example:

print('Alice said \'How do you do?\'.\nBob replied, \'Very well thank you!\'')

Output:

Alice said 'How do you do?'.
Bob replied, 'Very well thank you!'

A double quoted string literal can contain single quotes without any fuss (e.g. "I didn't do it") and likewise single quoted string can contain double quotes.

Raw strings

A raw string literal is prefixed by an r and passes all the chars through without special treatment of backslashes, so r'x\nx' evaluates to the length-4 string 'x\nx'.

raw = r'this\t\n and that'  
print(raw)
# Output: this\t\n and that

Multiline strings

A string literal can span multiple lines, but there must be a backslash \ at the end of each line to escape the newline. String literals inside triple quotes, """ or ''', can span multiple lines of text.

multi = """It was the best of times.
It was the worst of times."""
print(multi)

Output:

It was the best of times.
It was the worst of times.

Converting to string

The str() function converts values to a string form so they can be combined with other strings.

pi = 3.14
# text = 'The value of pi is ' + pi        # NO, does not work
# TypeError: can only concatenate str (not "float") to str
text = 'The value of pi is ' + str(pi)     # yes
print(text)

Accessing string characters

Characters in a string can be accessed using the standard [ ] syntax. Remember, Python uses zero-based indexing (like most other programming languages).

s = 'hello'
print(s[1])
# Output: e
print(s[4])
# Output: o
print(s[5]) 
# Error! IndexError: string index out of range

If the index is out of bounds for the string, Python raises an error. The Python style (unlike some other languages like Perl) is to halt if it can’t tell what to do, rather than just make up a default value.


Python uses negative numbers to give easy access to the characters at the end of the string: s[-1] is the last char 'o', s[-2] is the next-to-last char 'l', and so on. Negative index numbers count back from the end of the string:

s = 'hello'
print(s[-1]) # last char (1st from the end)
# Output: o
print(s[-4]) # 4th from the end
# Output: e

Note: Python does not have a separate character type. Instead an expression like s[8] returns a string-length-1 containing the character. With that string-length-1, the operators ==, <=, … all work as you would expect, so mostly you don’t need to know that Python does not have a separate scalar char type.

String length

The len(string) function returns the length of a string. The [ ] syntax and the len() function actually work on any sequence type — strings, lists, etc. Python tries to make its operations work consistently across different types.

s = 'hello'
print(len(s))
# Output: 5 

Python newbie gotcha: don’t use len as a variable name to avoid blocking out the len() function.

Accessing substrings (slicing)

The slice syntax is a handy way to refer to sub-parts of sequences — typically strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end. Suppose we have s = "Hello"

s = 'Hello'
print(s[1:4]) # from index 1 up to index 4 (but not including index 4)
# Output: ell
print(s[1:]) # omitting either index defaults to the start or end of the string
# Output: ello
print(s[:]) # omitting both gives a *copy* of the whole string
# Output: Hello
print(s[1:100]) # an index that is too big is truncated down to the string length
# Output: ello

Just like in indexing, negative numbers can we used while slicing too.

s = 'Hello'
print(s[:-3]) # going up to but not including the last 3 chars.
# Output: He
print(s[-3:]) # starting with the 3rd char from the end and extending to the end of the string.
# Output: llo

It is a neat truism of slices that for any index n, s[:n] + s[n:] == s. This works even for n negative or out of bounds. Or put another way s[:n] and s[n:] always partition the string into two string parts, conserving all the characters.

Modifying strings

Python strings are immutable which means they cannot be changed after they are created. Since strings can’t be changed, we construct new strings as we go to represent computed values. So for example the expression 'hello' + 'there' takes in the 2 strings 'hello' and 'there' and builds a new string 'hellothere'.

Concatenating (joining) strings

The + operator can concatenate two strings.

s = 'hi'
print(s + ' there')
# Output: hi there

String Methods

Here are some of the most common string methods. A method is like a function, but it runs “on” an object. If the variable s is a string, then the code s.lower() runs the lower() method on that string object and returns the result (this idea of a method running on an object is one of the basic ideas that make up Object Oriented Programming, OOP).

Here are some of the most common string methods:


s.lower(), s.upper() — returns the lowercase or uppercase version of the string. Example:

s = 'hello'
print(s.upper())
# Output: 'HELLO'
print(s)
# Output: 'hello'

Note: The string s itself is not modified. Python string methods usually return a new string.


s.strip() — returns a string with whitespace removed from the start and end

a = ' hi there  '
print(a.strip())
# Output: 'hi there'

s.isalpha() / s.isdigit() / s.isspace() … — tests if all the string characters are in the various character classes

print('hello'.isalpha())
# Output: True
print('hello123'.isalpha())
# Output: False
print('hello '.isalpha())
# Output: False

s.startswith('other'), s.endswith('other') — tests if the string starts or ends with the given other string

s = 'hello'
print(s.startswith('hell'))
# Output: True
print(s.startswith('he'))
# Output: True

s.find('other') — searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found

s = 'hello'
print(s.find('l'))
# Output: 2
print(s.find('ello'))
# Output: 1
print(s.find('allo'))
# Output: -1

s.replace('old', 'new') — returns a string where all occurrences of 'old' have been replaced by 'new'

s = 'hello hello'
print(s.replace('ello', 'i!'))
# Output: 'hi! hi!'

s.split('delim') — returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it’s just text.

print('aaa,bbb,ccc'.split(','))
# Output: ['aaa', 'bbb', 'ccc']

As a convenient special case s.split() (with no arguments) splits on all whitespace chars.

s = 'this\tand that'
print(s.split())
# Output: ['this', 'and', 'that']

s.join(list) — opposite of split(), joins the elements in the given list together using the string as the delimiter.

print('---'.join(['aaa', 'bbb', 'ccc']))
# Output: 'aaa---bbb---ccc'

A google search for “python str” should lead you to the official string methods — Python 3.7.0 documentation which lists all the str methods.

String formatting

Python has a % operator to put together a string. It takes a format string on the left which can have %d (for int), %s (for string), %f/%g (for floating point), and the matching values in a tuple on the right (a tuple is made of values separated by commas, typically grouped inside parentheses). Here’s an example to make it clear.

## Example: % operator
text = "%d little pigs come out or I'll %s and %s and %s" % (3, 'huff', 'puff', 'blow down')
print(text)

Output:

3 little pigs come out or I'll huff and puff and blow down

Python will throw an error if you pass an incorrect number of values to be formatted. It will also throw an error is you pass in a string when it expects an integer, and so on.


Multiline expressions: The above line is kind of long — suppose you want to break it into separate lines. You cannot just split the line after the % as you might in other languages, since by default Python treats each line as a separate statement (on the plus side, this is why we don’t need to type semicolons on each line). To fix this, enclose the whole expression in an outer set of parenthesis — then the expression is allowed to span multiple lines.

# add parens to make the long-line work:
text = ("%d little pigs come out or I'll %s and %s and %s" %
    (3, 'huff', 'puff', 'blow down'))
print(text)

Summary

In this tutorial, you learnt about

  • string literals - including literals with special characters and multi-line strings
  • string slicing - for getting sub-parts of strings
  • string methods - all sorts of useful things we can do with strings
  • string formatting - to include values from variables inside a string

Content was originally based on https://developers.google.com/edu/python/strings, but has been modified since. Licensed under CC BY 3.0. Code samples licensed under the Apache 2.0 License.


© 2016-2022. All rights reserved.