[go: up one dir, main page]

Why splitlines() instead of split("\n")?

Share
Copied to clipboard.
Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
3 min. read Watch as video Python 3.10—3.14

Strings in Python have a splitlines method.

But why does the splitlines method exist? Couldn't we just pass a newline character to the split method?

Splitting on \n

Let's say we have some text that came from a file:

>>> poem = "old pond\nfrog leaping\nsplash"

If we wanted to split this text up into lines, we could use the string splitlines method:

>>> poem.splitlines()
['old pond', 'frog leaping', 'splash']

That works... but why does that splitlines method exist?

Couldn't we just pass \n to the string split method, instead?

>>> poem.split("\n")
['old pond', 'frog leaping', 'splash']

While splitting on a line feed character will often work, I recommend using the string splitlines method instead. Because the splitlines method handles common scenarios that the split method doesn't.

Splitting on different types of newlines

Let's say we have some text that was retrieved from a database, and the original text came from a form submission in a web browser.

>>> text = "I really enjoyed the class.\r\n\r\nI'm starting to like Python!"

Web browsers often represent line breaks as a carriage return character, followed by a line feed character:

>>> line_break = "\r\n"

That's what we see in our text as well: \r followed by \n. This is often called CRLF (carriage return and line feed) whereas \n is called LF (line feed).

If we use the string split method to split by \n, we would see that each line ends in a carriage return character:

>>> text.split('\n')
['I really enjoyed the class.\r', '\r', "I'm starting to like Python!"]

That's kind of confusing!

If your application reads text from different sources and you don't deliberately normalize your line endings, you may end up with a mix of different line endings for different types of text. It's unfortunately common for some text in an application to use \n line endings, while other text uses \r\n line endings.

The splitlines method properly splits on all line endings:

>>> text.splitlines()
['I really enjoyed the class.', '', "I'm starting to like Python!"]

Our lines don't end with a carriage return anymore.

If all external text read by your application comes from user input and from text files, your line endings may already be normalized to use a single line feed character (\n).

But there's another issue that often crops up when splitting text into lines: trailing newlines.

Handling trailing newlines

There's a common convention that text files should end with a newline character. This makes it easier to append to the end of a file.

Here's text that we've read from a file:

>>> with open("poem.txt") as file:
...     poem = file.read()
...
>>> poem
'old pond\nfrog leaping\nsplash\n'

Note that this text ends with a line feed character (\n).

If we use the string split method to split this text into lines, we'll end up with a blank line at the end:

>>> poem.split('\n')
['old pond', 'frog leaping', 'splash', '']

We could avoid this by using the rstrip method, or the removesuffix method, to remove the final newline character, and then use the split method:

>>> poem.rstrip('\n').split('\n')
['old pond', 'frog leaping', 'splash']
>>> poem.removesuffix('\n').split('\n')
['old pond', 'frog leaping', 'splash']

Or we could just use the splitlines method:

>>> poem.splitlines()
['old pond', 'frog leaping', 'splash']

The splitlines method is aware of this common convention and it will automatically remove a trailing newline character, if it finds one.

Use splitlines, not split("\n")

So the next time you need to split a string into lines, don't use the string split method. Use the splitlines method.

The splitlines method will split on any line endings, and it will remove a final newline, if there is one.

5 Keys to Python Success 🔑

Sign up for my 5 day email course and learn essential concepts that introductory courses often overlook!