[go: up one dir, main page]

0% found this document useful (0 votes)
4 views80 pages

Text Processing in Java

The document covers key concepts in Java programming related to the String and Character classes, including immutability, mutable strings with StringBuilder, and the use of regular expressions for string manipulation. It explains methods for character processing, string formatting, and tokenization, as well as the use of Pattern and Matcher classes for regular expression handling. The content aims to provide foundational knowledge for working with strings and characters in Java applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views80 pages

Text Processing in Java

The document covers key concepts in Java programming related to the String and Character classes, including immutability, mutable strings with StringBuilder, and the use of regular expressions for string manipulation. It explains methods for character processing, string formatting, and tokenization, as well as the use of Pattern and Matcher classes for regular expression handling. The content aims to provide foundational knowledge for working with strings and characters in Java applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Java How to Program, 10/e

© Copyright 1992-2015 by Pearson Education, Inc. All Rights


Reserved.
• Review char, Character class and String Class:
• What does it mean for String class to be
immutable?
• Use StringBuilder class to deal with mutable
strings
• Learn about Regular Expressions
• Use regular expressions for matching and
splitting strings.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 Purpose: enable primitive-type values to be treated as
objects:
▪ Boolean, Character, Double, Float, Byte, Short,
Integer and Long
 Autoboxing: Automatic conversion between char
literals and Character objects
▪ Also for other primitive types and their respective wrapper
classes.
 Most Character methods are static methods
designed for convenience in processing individual
char values.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 A program may contain character literals as in ‘d’.
▪ Characters include letters, digits, punctuation, space, tab, new
line, symbols and others.
▪ Stored as a four byte integer using Unicode.
 Method charValue returns the char value stored in
the object.
 Method toString returns the String
representation of the char value stored in the object.
 Method equals determines if two Characters
have the same contents.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 Character method forDigit converts its first
argument into a character in the base (radix 2 to 36)
specified by its second argument.
▪ Character.forDigit(0, 16) →‘0’
▪ Character.forDigit(9 , 16) →‘9’
▪ Character.forDigit(10 , 16) →‘a’
▪ Character.forDigit(15 , 16) →‘f’
 Character method digit converts its first
argument into an integer in the number system
specified by its second argument.
▪ Character.digit(‘a’, 16) → 10

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 String class represent immutable strings;
▪ String literals (stored in memory as String objects) are
written as a sequence of characters in double quotation marks.
 StringBuilder class represent mutable strings
 Both in java.lang package.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 length determines the number of characters in a string.
 charAt returns the character at a specific position in the
String.
 getChars copies the characters of a String into a
character array.
▪ The first argument is the starting index in the String from which
characters are to be copied.
▪ The second argument is the index that is one past the last character to
be copied from the String.
▪ The third argument is the character array into which the characters
are to be copied.
▪ The last argument is the starting index where the copied characters
are placed in the target character array.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 Formatting strings
 System.out.printf(“%d - %s”, 42, “is the answer to life the universe
and everything”);
 Converting strings to numbers
 The Number subclasses that wrap primitive numeric types (Byte,
Integer, Double, Float, Long, and Short) each provide a class
method named valueOf that converts a string to an object of that
type.
 Since, for example, Integer is an object, but not a primitive, we
can use intValue to get the primitive. Same goes for other types
(byteValue, doubleValue, floatValue, longValue, shortValue)
 int a = Integer.valueOf("42").intValue();
 Class StringBuilder is used to create and manipulate dynamic
string information. (Modifiable strings)
 Every StringBuilder is capable of storing a number of
characters specified by its capacity.
 If the capacity of a StringBuilder is exceeded, the
capacity expands to accommodate the additional characters.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 There are special characters that cannot be easily printed.
Tab, newline, etc.
 We need a special approach to include special characters in
strings
▪ System.out.println("She said:\n\t\"Hello!\"\n to me.");
She said:
"Hello! “
to me.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 When you read a sentence, your mind breaks it into
tokens—individual words and punctuation marks that
convey meaning.
 Compilers also perform tokenization.
 String method split breaks a String into its
component tokens and returns an array of Strings.
 Tokens are separated by delimiters
▪ Typically white-space characters such as space, tab, newline
and carriage return.
▪ Other characters can also be used as delimiters to separate
tokens.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 A regular expression is a specially formatted String
that describes a search pattern for matching characters
in other Strings.
 Useful for validating input and ensuring that data is in a
particular format.
▪ Validate phone numbers, email or postal addresses,…
▪ Validate file formats
▪ Validate program syntax

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
Any string containing ordinary characters match itself.
There are also special characters with specific meaning.

Since ‘\’ is a special escape character, you must use “\\” to insert a
single backslash into a string!!!

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 To match a set of characters use special characters[]-^.
▪ "[aeiou]" matches any single vowel.
▪ "[A-Y]" matches any single uppercase letter except for Z.
▪ "[A-z]" matches all characters (such as [ and \) with an integer value
between uppercase A and lowercase z.
▪ "[A-Za-z]" matches all uppercase and lowercase letters.
▪ If the first character in the brackets is "^", the expression accepts
any character other than those indicated.
 "[^Z]" matches any character other than capital Z, including lowercase
letters and nonletters such as \n

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 String method matches receives a String that
specifies the regular expression and matches the
contents of the String object on which it’s called to
the regular expression.
▪ The method returns a boolean indicating whether the match
succeeded.
 A regular expression consists of literal characters and
special symbols.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 The dot character "." in a regular expression matches any
single character except a newline character.
 The character "|" matches the expression to its left or to
its right.
▪ "Hi (John|Jane)" matches both "Hi John" and "Hi Jane".
 Parentheses are used to group parts of the regular
expression.
 How to match a parenthesis?
▪ Use “\\(|”
 First slash is used to insert a slash in the string; and having a slash before
the parenthesis prevents it from being interpreted as a special char for reg
ex.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 Quantifiers are used for describing repeating patterns.
 “\\d{5,7}” matches any five-to-seven-digit number.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 Sometimes it’s useful to replace parts of a string or to
split a string into pieces. For this purpose, class
String provides methods replaceAll, replaceFirst and
split.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 In addition to the regular-expression capabilities of class
String, Java provides other classes in package
java.util.regex that help developers manipulate regular
expressions.
 Class Pattern represents a regular expression.
 Class Matcher contains both a regular-expression pattern and a
CharSequence in which to search for the pattern.
 CharSequence (package java.lang) is an interface that allows
read access to a sequence of characters.
 The interface requires that the methods charAt, length,
subSequence and toString be declared.
 Both String and StringBuilder implement interface
CharSequence, so an instance of either of these classes can be
used with class Matcher.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
 If a regular expression will be used only once, static
Pattern method matches can be used.
▪ Takes a String that specifies the regular expression and a
CharSequence on which to perform the match.
▪ Returns a boolean indicating whether the search object (the
second argument) matches the regular expression.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 If a regular expression will be used more than once, it’s
more efficient to use static Pattern method compile
to create a specific Pattern object for that regular
expression.
▪ Receives a String representing the pattern and returns a new
Pattern object, which can then be used to call method matcher
▪ Method matcher receives a CharSequence to search and returns
a Matcher object.
 Matcher method matches performs the same task as
Pattern method matches, but receives no arguments—
the search pattern and search object are encapsulated in the
Matcher object.
 Class Matcher provides other methods, including find,
lookingAt, replaceFirst and replaceAll.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 Matcher method find attempts to match a piece of
the search object to the search pattern.
▪ Each call to this method starts at the point where the last call
ended, so multiple matches can be found.
 Matcher method lookingAt performs the same
way, except that it always starts from the beginning of
the search object and will always find the first match if
there is one.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
 Matcher method group returns the String from the
search object that matches the search pattern.
▪ The String that is returned is the one that was last matched
by a call to find or lookingAt.
 As you’ll see in Section 17.7, you can combine regular-
expression processing with Java SE 8 lambdas and
streams to implement powerful String-and-file
processing applications.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.

You might also like