Working with Strings
Strings are one of the most common types of data you will encounter while programming. This entire page is made up of strings, my name can be a string, passing messages back and forth can be a string.
We’ve already seen strings and how they work before (data that stores any letters and numbers together), but now I will cover some more detail on working with them better as this will be something that you will commonly encounter a lot while programming in real life and on assignments.
Being good with working with strings will take you far on your programming life.
This is already a very large list of what is possible with strings. Many of these
also generally apply to general collections like lists. I’ve tried keeping this
to most of functions that you will practically use. You can also use this as a
reference and don’t worry too much about memorizing each and every function.
Convert to string
Similar to int(), float() we can use str() to convert thingsto a string.
Often we convert other types of data into strings to make use of powerful string
processing methods like we see below.
>>> str(1)
'1'
>>> str(1.0)
'1.0'
A list of characters
Strings can be thought of like a list of characters. In fact, many languages
actually implement it with that approach and some books will have you deal with
only character arrays in languages like C, C++ etc. Thinking about it like lists,
lets you use list like syntax to operate on strings.
As an example of how that works 'harsh183', can be broken up into h, a, r,
s, h, 1, 8, 3
Length
Seen in earlier lessons, len() works the same way as lists.
>>> username = 'harsh183'
>>> len(username)
8
Access with index numbers
>>> username = 'harsh183'
>>> username[0]
'h'
>>> username[1]
'a'
>>> username[2]
'r'
>>> username[-1]
'3'
>>> username[-2]
'8'
Using index numbers we can access a particular letter from a string, both from forward and backward.
Substring
Sometimes you’ll want to get a part of the string called a substring. We can
also do this with list syntax. We do with the start_index:end_index syntax
where start is inclusive and end is exlcusive (again this might seem like a
weird pattern to you but this occurs a lot in programming)
Let’s see it in action.
>>> word = 'carpet'
>>> list(word)
['c', 'a', 'r', 'p', 'e', 't']
>>> word[0:3]
'car'
>>> word[3:6]
'pet'
If we don’t specify either the start or ending it goes all the way to the start or end of the string respectively.
>>> word[:4]
'carp'
>>> word[2:]
'rpet'
Split
Often you will have to split strings based on some pattern or rule. When we split a string we get all the pieces as a list of strings. Let’s see this in action.
>>> phone_number = '217-520-4983'
>>> phone_number.split('-')
['217', '520', '4983']
>>> phone_number.split('-')[0] # area code
'217'
A very common scenario when we want to extract pieces of data out of strings. We can give it a string that it has to split between and the new partitioned strings will be in a list.
Join
This is basically the reverse of split() where it takes a list of strings
and combines them together and the function parameter is what it places inbetween.
>>> birthday = ['4', '10', '2000']
>>> "/".join(birthday)
'4/10/2000'
Note that the syntax for this is kind of different from split and this confuses people incoming from other languages all the time. The syntax for this is basically
"string I want there to be in the middle".join(array_we_want_to_join)
If you just want to rejoin it without anything in the middle we can use an empty
string ""
>>> "".join(birthday)
'4102000'
Lots of times you combine split and join together.
>>> name = "Harsh Deep"
>>> name.split(" ")
['Harsh', 'Deep']
>>> "-".join(name.split(" "))
'Harsh-Deep'
See also: why list.join() isn’t used
Change particular letter
Earlier we were able to do list like syntax for getting single characters, but when we try to apply the same principle to change a single letter that breaks down.
>>> username[1] = 's'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
This is because strings are immutable which means that they cannot be changed without creating a new string. They’re not immutable in every language.
Lists however are mutable which means that they can be changed without
literally remaking a new list. So if we want to change just one letter we can
make it a list and then a string again.
>>> list(username)
['h', 'a', 'r', 's', 'h', '1', '8', '3']
Just like int(something) or str(something) we can also use list(something)
to convert to lists.
To convert back we can simply join() with an empty string in between. The final code looks like
>>> username_list = list(username)
>>> username_list
['h', 'a', 'r', 's', 'h', '1', '8', '3']
>>> username_list[1] = 's'
>>> username_list
['h', 's', 'r', 's', 'h', '1', '8', '3']
>>> "".join(username_list)
'hsrsh183'
Replace
We can use replace to replace one pattern with another. Like
>>> "Hello world".replace("l", "w")
'Hewwo worwd'
This is also a very common task while programming and working with strings.
In practice you can also doing it combining split() and join() but in terms
of readability it looks better.
>>> "w".join("Hello world".split("l"))
'Hewwo worwd'
Strip
Lots of time when collecting user input or taking strings from real life sources,
like websites or printer pages we get extra spaces in the starting and ending.
Python’s strip functions strip(), lstrip(), rstrip() handle that case.
>>> " aaaaaaa lots of spacing in between "
' aaaaaaa lots of spacing in between '
>>> " aaaaaaa lots of spacing in between ".strip() # remove both sides
'aaaaaaa lots of spacing in between'
>>> " aaaaaaa lots of spacing in between ".lstrip() # remove left side
'aaaaaaa lots of spacing in between '
>>> " aaaaaaa lots of spacing in between ".rstrip() # remove right side
' aaaaaaa lots of spacing in between'
Adding
We’ve seen this one before, but we can use + to join strings together side by
side.
>>> "Hello " + "Harsh!"
'Hello Harsh!'
>>> "10" + "10"
'1010'
>>> 10 + 10 # This is different from above - common beginner mistake
20
Make sure that both sides are strings before trying to add them to avoid
weird results. You can do this with str() if you need to.
>>> "I am " + 18 + " years old"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be str, not int
>>> "I am " + str(18) + " years old"
'I am 18 years old'
Multiplying
As elementary schools teach integer multiplication as repeated addition the same works with strings quite well as well.
>>> 2 + 2 + 2 + 2 + 2
10
>>> 2 * 5 # which is the same
10
>>> "2" + "2" + "2" + "2" + "2"
'22222'
>>> "2" * 5 # likewise also the same
'22222'
Note this only works with integers, trying this with floats or strings doesn’t quite make meaningful sense.
Case
In English and many other languages the same letters are split between UPPER CASE and lower case. This will be a pretty common set of functions you will have to deal with while working with strings. Most of these functions are quite similar, so take care trying to notice a pattern between them.
Note: the checking functions don’t consider numbers or any other non letter characters
Upper case
>>> 'hARSh dEEp 18'.upper()
'HARSH DEEP 18'
# To check
>>> "HARSH DEEP 18".isupper()
True
>>> "HARSH DEEp 18".isupper()
False
Lower case
>>> 'hARSh dEEp 18'.lower()
'harsh deep 18'
# To check
>>> "harsh deep 18".islower()
True
>>> "harsh deeD 18".islower()
False
Title case
This is how you see in newspapers, page titles, essays, names etc.
>>> 'hARSh dEEp 18'.title()
'Harsh Deep 18'
# To check
>>> "Harsh Deep 18".istitle()
True
>>> "Harsh DeeP 18".istitle()
False
Swapcase
>>> 'hARSh dEEp 18'.swapcase()
'HarsH DeeP 18'
Note: Pretty obvious but there isn’t really a good way to check if swapcase happens because it’s pretty just a toggle and no specific rule.
startsWith and endsWith
We can check if strings follow a certain pattern when starting or ending.
>>> "http://www.example.com".startswith("http:")
True
>>> "http://www.example.com".startswith("https:")
False
Here we use the startswith to check if a website is secure (https) or insecure(http).
>>> "harsh@gmail.com".endswith("gmail.com")
True
>>> "harsh@gmail.com".endswith("yahoo.com")
False
This checks if the email id is from a certain website (here gmail or yahoo) but just a simple example to demonstrate this in action.
Find and in
Similar to lists, the in keyword can be used to check if a letter or string exists
within another string. T
>>> "I" in "TEAM"
False
>>> "20" in "I am 20 years old"
True
>>> "love" not in "my life"
True
>>> "abc" not in "bcd"
True
Find is a little more complex, but this is when you also want to get the index of the thing you are searching for as well.
>>> "abcde".find("a")
0
>>> "abcde".find("c")
2
>>> "abcde".find("d")
3
>>> "abcde".find("f") # When something does not exist it gives -1
-1
>>> "abcde".find("de") # Multiple letters too
3
Exercises
These don’t cover everything but a good point to get you started thinking. It will take much more practice to actually get good at this.
The last lesson aslo covered testing with pytest, so trying to
- Convert a phone number from
123-456-7890to(123) 456 7890
If it’s not 12 letters long then just return "Invalid number"
Here is a starting point, also downloadable at here
# Put real function
def convert(number):
return 0
phone_number = "123-456-7890"
formatted_number = convert(phone_number)
print(formatted_number)
def test_convert():
assert convert("123-456-7890") == "(123) 456 7890"
assert convert("217-516-4564") == "(217) 516 4564"
assert convert("123") == "Invalid number", "Invalid numbers are not proprly handled"
- Get the initials of the name and join it with dashes in upper case
- “Harsh Deep” becomes “HD”
- “alan turing” becomes “AT”
- Empty strings give
""
Don’t worry about edge cases like numbers, ', titles etc.
Here are some test cases I wrote to give an idea of what to achieve. Download here
def initials(name):
# Code here
return ""
def test_initials_normal():
assert initials("Harsh Deep") == "HD"
assert initials("alan turing") == "AT"
assert initials("harsh") == "H"
def test_initials_empty_string():
assert initials("") == ""