Hello friends, today we’ll learn how to split a string in Python and practice with several examples. Splitting a string is a common programming task that even you would have seen countless times in your projects.
Most Common Ways to Split a String in Python
Python is quite a versatile language. It provides many ways to split a string. For example, you can use the slice operator or split() or rsplit()
methods, or even use Python RegEx to cut a string into several pieces. Moreover, there are some lesser-known techniques such as using splitlines()
and partitioning methods.
String Split with split()
Python split() is a built-in method that splits a string into substrings based on the given delimiter. The method scans the input string from left to right and as it finds the delimiter, it considers that as a splitting point.
Python split()
Syntax
Python split()
takes a separator (default is space) and max splits as optional args, returning a list of substrings. It splits a string based on the provided separator and limits the splits according to maxsplit.
Split String Having a Single Delimiter
Let’s consider a real-time problem where you have a string representing a date in the format “YYYY-MM-DD”, and you want to extract the individual components (year, month, day) using the split()
method:
# Real-time Example: Split a date string using a delimiter date_string = "2022-02-18" # Using split('-') to separate the date components from the string year, month, day = date_string.split('-') print("After splitting the string using the delimiter '-':") print("Year:", year) print("Month:", month) print("Day:", day)
The above code will give the following result.
year='2022', month='02', day='18'
By default, split()
separates a string using one or more whitespace characters.
Also Checkout – Python String Strip
Split String Having Multiple Delimiters
Let’s consider a little difficult string problem where you have a log string. It contains information about user activities, and you want to extract user IDs and actions. The log string follows the pattern "UserID:Action|UserID:Action|..."
.
This example will showcase using split()
to divide string having multiple delimiters.
# Extract user IDs and actions from a log string log_data = "123:login|456:logout|789:login|321:logout" # The string has multiple delimiters '|' and ":' # Let's use split('|') to separate individual user actions # And use split(':') to split within each part user_actions = [entry.split(':') for entry in log_data.split('|')] print("After splitting the string using the delimiters '|', ':':") for user_action in user_actions: user_id, action = user_action print("User ID:", user_id, "Action:", action)
After running the code, it gives the following output.
After splitting the string using the delimiters '|', ':': User ID: 123 Action: login User ID: 456 Action: logout User ID: 789 Action: login User ID: 321 Action: logout
This is particularly useful for parsing CSV files and other structured data.
Split String into an Array|List of Numbers
Let’s consider a real-world scenario where you have a string representing the grades of students in a class. You want to analyze and store these grades in both a list and a Python array for further processing.
# Split a string of student grades into a list and an array of integers from array import array grades_data = "90 85 92 78 88" # Using split() to separate individual grades and map them to integers grades_list = list(map(int, grades_data.split())) # Using split() and array() to create an array of grades grades_array = array('i', map(int, grades_data.split())) print("After splitting the string of student grades:") print("List of Grades:", grades_list) print("Array of Grades:", grades_array)
You will get the following output upon execution.
After splitting the string of student grades: List of Grades: [90, 85, 92, 78, 88] Array of Grades: array('i', [90, 85, 92, 78, 88])
In this example, we demonstrated how to split a string into an array or a list of numbers.
String Split with rsplit()
Python rsplit()
is another built-in method that splits a string into substrings based on the given delimiter, but it starts splitting the string from the right. The method scans the input string from right to left, and as it finds the delimiter, it considers that as a splitting point.
Python rsplit()
Syntax
Python rsplit()
takes a separator (default is space) and max splits as optional arguments, returning a list of substrings. It splits a string based on the provided separator, starting from the end of the string and limiting the splits according to maxsplit.
The above picture not only conveys the Python rsplit()
syntax but also points out the differences between rsplit
and split
.
Difference Between split()
vs rsplit()
The default behavior of split()
and rsplit()
is equivalent when no specific delimiter is provided. Both methods split the string into a list of substrings based on whitespace. Therefore, in the below example, both calls would give the same output.
sentence = "Python split method example"
print()
print(sentence.split())
print()
print(sentence.rsplit())
If you want to observe a difference between split()
and rsplit()
, you can specify a different delimiter or provide a maximum number of splits. For example:
Also, for your note, in the context of the split()
and rsplit()
methods in Python, the maxsplit
parameter specifies the maximum number of splits to perform.
For example, if you use maxsplit=2
, which means the string will be split at most 2 times. This results in 3 parts: the part before the first occurrence of the delimiter, the part between the first and second occurrences, and the part after the second occurrence.
Here’s an illustration:
sentence = "Python split method example"
# Using maxsplit=2
print(sentence.split(" ", 2))
# Result1: ['Python', 'split', 'method example']
print(sentence.rsplit(" ", 2))
# Result2: ['Python split', 'method', 'example']
Split to the Max Splits Possible
You can create a loop to iterate from 1 to the maximum possible number of splits and print the output along with the number of splits done. Here’s an example:
sentence = "Python split method example"
# Using split
print("Using split():")
for i in range(1, len(sentence.split()) + 1):
result = sentence.split(" ", i)
print(f'{i} split(s): {result}')
sentence = "Python rsplit method example"
# Using rsplit
print("\nUsing rsplit():")
for i in range(1, len(sentence.split()) + 1):
result = sentence.rsplit(" ", i)
print(f'{i} split(s): {result}')
In this example, the loop iterates from 1 to the maximum possible number of splits (the number of words in the sentence) and prints the output along with the number of splits done for both split()
and rsplit()
. Adjust the loop range according to your specific needs.
String Split Using Python RegEx
Python provides a re
module that has a re.split()
method. It is part of regular expressions in Python to split a string into substrings.
Python RegEx Split() Syntax
The re.split()
function in Python uses a pattern to split a string. It takes a regular expression pattern, the string to be split, and an optional maxsplit parameter. It returns a list of parts based on the pattern, making it handy for complex splitting using regular expressions.
Python RegEx Split Multiple Delimiters
Let’s consider a scenario where you have a string with multiple delimiters. It contains both names and phone numbers in a single line, and you want to separate them into a more structured format. The names are followed by phone numbers, and they are separated by different separators.
Here’s an example string:
"John:123-456-7890, Jane/987-654-3210; Bob-555-1234"
Now, you want to split this string using re.split()
to extract and separate the names and phone numbers. The regular expression pattern can identify various separators like :
, ,
, /
, ;
, and -
.
import re
input_string = "John:123-456-7890, Jane/987-654-3210; Bob-555-1234"
# Define the pattern to match various separators
pattern = r'[:;,/-]'
# Use re.split() to separate names and phone numbers
result = re.split(pattern, input_string)
# Filter out empty strings from the result
result = [part.strip() for part in result if part.strip()]
# Separate names and phone numbers
names = result[::2]
phone_numbers = result[1::2]
print("Names:", names)
print("Phone Numbers:", phone_numbers)
When you run, the code will generate the following result:
Names: ['John', 'Jane', 'Bob']
Phone Numbers: ['123-456-7890', '987-654-3210', '555-1234']
Python Keep Delimiter Using re split()
Let’s consider a scenario where you have a string containing a mathematical expression. Now, you want to split it into individual operands and operators while keeping the delimiters (operators) as part of the result.
Here’s an example string:
expression = "3 + 5 * 2 - 8 / 4"
Now, you want to use re.split()
to split this string while keeping the mathematical operators as part of the result.
import re
expression = "3 + 5 * 2 - 8 / 4"
# Define the pattern to match operators
pattern = r'(\+|\-|\*|\/)'
# Use re.split() to split the expression while keeping operators
result = re.split(pattern, expression)
# Filter out empty strings from the result
result = [part.strip() for part in result if part.strip()]
print("Result:", result)
Upon running the code, you will get the following result:
Result: ['3', '+', '5', '*', '2', '-', '8', '/', '4']
In this example, we use re.split()
with a pattern to capture mathematical operators. The outcome is a list that includes both operands and distinct operators. This technique proves beneficial when analyzing or manipulating mathematical expressions in your code.
Split String by Character Count
In Python, you can split a string into substrings of the given character count using various methods. The most basic way to do it is by iterating through the string and creating substrings of the desired length. We’ll explain two common approaches: using a loop and using list comprehension.
Split String in a Loop by Char Count
One way to split a string into substrings of a specified character count is by using a loop. Here’s a detailed example:
def str_split_by_count(in_str, count):
result = []
cur_pos = 0
while cur_pos < len(in_str):
substr = in_str[cur_pos:cur_pos + count]
result.append(substr)
cur_pos += count
return result
# Example usage
in_str = "abcdefghij"
char_ctr = 3
outlist = str_split_by_count(in_str, char_ctr)
print(outlist)
In this example, the function str_split_by_count
takes the input string and passes the desired character count. It initializes an empty list (result
) and a variable (cur_pos
) to keep track of the current position in the string. The loop iterates through the string, creating substrings of the specified count and pushing them to the result list.
Split with List Comprehension by Char Count
Another concise way to achieve the same result is by using list comprehension:
def str_split_by_count(in_str, count):
return [in_str[i:i + count] for i in range(0, len(in_str), count)]
# Example / use case
in_str = "abcdefghij"
char_ctr = 3
outlist = str_split_by_count(in_str, char_ctr)
print(outlist)
In this example, the str_split_by_count() iterates over the string, creating substrings of the specified count. After that, we called the range
function to generate indices at intervals of the specified count.
More Examples of String Split by Character Count
Let’s explore a few more examples to illustrate the flexibility of these functions:
# Use case 1
in_str = "abcdefghijkl"
char_ctr = 4
res_list = str_split_by_count(in_str, char_ctr)
print(res_list)
# Output: ['abcd', 'efgh', 'ijkl']
# Use case 2
in_str = "pythonisawesome"
char_ctr = 2
res_list = str_split_by_count(in_str, char_ctr)
print(res_list)
# Output: ['py', 'th', 'on', 'is', 'aw', 'es', 'om', 'e']
# Use case 3
in_str = "abcdefghij"
char_ctr = 5
res_list = str_split_by_count(in_str, char_ctr)
print(res_list)
# Output: ['abcde', 'fghij']
In these examples, you can see how the functions adapt to different input strings and character counts, effectively splitting the strings as desired.
In summary, these approaches provide flexible ways to split a Python string by the given character count. Whether you prefer a loop or a more concise list comprehension, you can choose the method that best fits your coding style and requirements.
Split a String in Python by Using Splitlines()
In Python, the splitlines()
method is a convenient way to split a string into a list of lines. This method considers various newline characters such as ‘\n’, ‘\r’, and ‘\r\n’, making it suitable for handling text files created on different platforms. We’ll go into the details of using the splitlines()
function with multiple examples to illustrate its functionality.
Split String with Newline
The splitlines()
method can be applied to a string object, and it returns a list of lines:
text = "Hello\nWorld\nPython"
lines = text.splitlines()
print(lines)
In this example, the string “Hello\nWorld\nPython” contains newline characters. When splitlines()
is applied, it recognizes these characters and creates a list with three elements, each representing a line:
['Hello', 'World', 'Python']
Split String with Multiple Delimiters
One of the strengths of the Python splitlines() method is its ability to handle multiple delimiters or line endings. For instance, consider a string with a mix of ‘\n’ and ‘\r\n’ line delimiters:
mixed_line_endings = "Line 1\nLine 2\r\nLine 3\nLine 4"
lines = mixed_line_endings.splitlines()
print(lines)
The result is a list with four elements, each corresponding to a line, regardless of the line ending used:
['Line 1', 'Line 2', 'Line 3', 'Line 4']
Keep Line Endings
By default, splitlines()
removes the line endings from the resulting lines. However, you can preserve them by setting the keepends
parameter to True
:
text = "Hello\nWorld\nPython"
lines_with_endings = text.splitlines(keepends=True)
print(lines_with_endings)
This will output a list with the line endings intact:
['Hello\n', 'World\n', 'Python']
Handle Empty Lines
splitlines()
handles empty lines gracefully. Consider the following example:
text_with_empty_lines = "Line 1\n\nLine 3"
lines = text_with_empty_lines.splitlines()
print(lines)
The output is a list with three elements, including an empty string representing the empty line:
['Line 1', '', 'Line 3']
Additional Examples
Let’s explore a few more examples to showcase the versatility of splitlines()
:
# Eg. 1
poem = "The path not used\nTwo paths splitted in a wood,\nAnd sorry he could not walk both"
lines = poem.splitlines()
print(lines)
# Output: ['The path not used', 'Two paths splitted in a wood,', 'And sorry he could not walk both']
# Eg. 2
address = "786 Ind St.\nBlock 4\nAmbience"
lines = address.splitlines()
print(lines)
# Output: ['786 Ind St.', 'Block 4', 'Ambience']
# Eg. 3
multiline_string = """This is a
multiline
string."""
lines = multiline_string.splitlines()
print(lines)
# Output: ['This is a', 'multiline', 'string.']
In summary, the splitlines()
method in Python is a versatile tool for splitting strings into lines, handling various line endings, preserving or removing line endings as needed, and gracefully managing empty lines. It is particularly useful when working with text data that may come from different sources with different newline conventions.
Using Partition()
and Rpartition()
In Python, the partition()
and rpartition()
methods split strings into three parts using a chosen delimiter. Unlike split(), which creates a list, these methods divide the string into three components: the part before the delimiter, the delimiter itself, and the part after the delimiter.
This three-part structure simplifies handling and adds clarity to string manipulation. Let’s explore these advanced string-splitting techniques with detailed examples.
Split String by Partition()
The partition()
method splits a string into three parts using the first occurrence of a specified delimiter. The result is a tuple containing the part before the delimiter, the delimiter itself, and the part after the delimiter.
Let’s consider a scenario where you have a log entry with information about a user and their activity, and you want to extract details such as the username, action, and timestamp.
# Example log entry
log_entry = "user123 performed action: 'edit' at timestamp: '2022-03-01 08:45:00'"
# Using partition() to split the log entry
user, _, remaining = log_entry.partition(' performed action: ')
action, _, timestamp = remaining.partition(" at timestamp: ")
# Printing the results
print("Original Log Entry:", log_entry)
print("User:", user)
print("Action:", action)
print("Timestamp:", timestamp)
This program prints the following result:
User: user123
Action: 'edit'
Timestamp: '2022-03-01 08:45:00'
In this example, we use the partition(' performed action: ')
method to split the log entry into three parts. Subsequently, the partition(" at timestamp: ")
method further splits the remaining part into the action and timestamp. This technique is valuable for parsing log entries or similar stuff.
Split String by rpartition()
The rpartition()
method is similar to partition(), but it searches for the last occurrence of the specified delimiter. Like partition(), it returns a tuple with the part before the delimiter, the delimiter itself, and the part after the delimiter.
Let’s consider a more challenging problem where you have a string representing a mathematical expression. Now, you want to extract info about the expression’s components using rpartition()
.
# Example math expr
math_expr = "3 * (5 + 2) / 4"
# Using rpartition() to split the math expr
operator, _, operand = math_expr.rpartition(' ')
# Prin the results
print("Original Math Expression:", math_expr)
print("Operator:", operator)
print("Operand:", operand)
The result of the above code is as follows:
Original Math Expression: 3 * (5 + 2) / 4
Operator: 3 * (5 + 2) /
Operand: 4
This example uses rpartition(' ')
to break down a mathematical expression into its operator, the character ‘ ‘, and the operand. While handling complex expressions can be challenging, this showcases how rpartition()
aids in extracting information from intricate strings.
Points to Consider for Coding
When choosing a string-splitting method, consider the following:
Time and Space Complexity
- split(): It takes O(n) time, where n is the string’s length. It generates a list of substrings, resulting in O(n) space complexity.
- rsplit(): Similar to split(), but splits from the right end.
- Regex-based Splitting: Time complexity varies based on regex pattern complexity. Simple patterns have linear time complexity, while complex ones may take more time. Memory usage depends on the number of resulting substrings.
Choosing the Right Method
- For basic splitting by spaces or a single delimiter, opt for split() or
rsplit()
. - For handling complex patterns or multiple delimiters, consider utilizing
re.split()
. - If performance is crucial, especially with large strings, conduct tests and profile your code to identify the most efficient method.
Practice with More Examples
Below are some useful examples of splitting strings in Python. Get ready with your Python IDE.
Also Read: Merge Multiple CSV files in Python
Parsing CSV Data
One of the most common use cases for string splitting is parsing CSV (Comma-Separated Values) data:
import csv # Parsing CSV data csv_data = "name,age,email\nRama,30,rama@ramayan.com\nSita,25,sita@ramayan.com" csv_reader = csv.reader(csv_data.splitlines()) for row in csv_reader: print(row)
Output:
['name', 'age', 'email'] ['Rama', '30', 'rama@ramayan.com'] ['Sita', '25', 'sita@ramayan.com']
Here, we split the CSV data into lines and then call the csv.reader
to parse it into rows.
Tokenizing Text
Tokenization is a crucial step in natural language processing (NLP). It involves splitting text into words or tokens:
import nltk # Tokenizing text using NLTK (Natural Language Toolkit) # Please ensure that nlth module is installed text = "Tokenization is an important NLP task." tokens = nltk.word_tokenize(text) print(tokens)
Python’s nltk tokenize
function splits the string in the following manner.
['Tokenization', 'is', 'an', 'important', 'NLP', 'task', '.']
In this example, we use the NLTK library for tokenization, which is more robust than simple string splitting for handling natural language text.
Extracting URLs from Text
Extracting URLs from a block of text is a common task when dealing with web scraping or text analysis:
import re # Extracting URLs from text using regex text = "Visit our website at https://www.ramayan.com. For more info, go to http://ramayan.org." urls = re.findall(r'https?://\S+', text) print(urls)
After execution, you will get the following output.
['https://www.ramayan.com.', 'http://ramayan.org.']
In this example, the regex pattern https?://\S+
matches HTTP or HTTPS URLs.
Also Read: Multi-line String in Python
Splitting Multiline Text
Splitting multiline text into paragraphs or sentences is useful for text processing tasks. Check the following example.
# Split multiline string into paragraphs text = """Para 1: This is the first para. Para 2: This is the second para. Para 3: And this is the third para.""" paras = text.split('\n\n') # Assuming double line breaks between paras print(paras)
The above Python program prints the following.
['Para 1: This is the first para.', 'Para 2: This is the second para.', 'Para 3: And this is the third para.']
In this example, we split the text into paragraphs by detecting double-line breaks (\n\n
).
Before You Leave
In this guide, we’ve covered diverse ways to split strings in Python. You must have read through split(), rsplit()
along with their differences. Later, other techniques like using regular expressions and partitioning. Simultaneously, we made sure to add practical and realistic examples that actually can help.
Lastly, our site needs your support to remain free. Share this post on social media (Linkedin/Twitter) if you gained some knowledge from this tutorial.
Happy coding,
TechBeamers.