|
|
c m s c 311
f a l l 2 0 0 2 |
There are two reasons for giving this project. First, I think tokenizing is a useful skill everyone should have. Breaking up a line is just something that you tend to do when working with text.
Second, the rest of the projects require you to manipulate text. Many students probably used all sorts of "cin" tricks to read the input in, and ended up with code that was difficult to read, write, and to fix up.
The idea of these methods is to emulate features of the StringTokenizer in Java. Of course, this isn't the same thing, but it's close enough.
To process a file, you will read in a line at a time, then break the line up using the methods below into tokens. At that point, you will process the tokens.
| Tokenizer |
| static string leftTrim( const string & str ) ; |
| Given parameter str, removes any leading blank spaces. For example, if str is " hello, world ", then the string returned should be "hello, world ". Only the leading spaces have been removed. The remaining spaces are unchanged. If there are no leading spaces, just return the string as is (an empty string falls into this category). |
| static string rightTrim( const string & str ) ; |
| Given parameter str, removes any trailing blank spaces. For example, if str is " hello, world ", then the string returned should be " hello, world". Only the trailing spaces have been removed. The remaining spaces are unchanged. If there are no trailing spaces, just return the string as is (an empty string falls into this category). |
| static string trim( const string & str ) ; |
|
Given parameter str, removes any leading AND trailing blank spaces.
For example, if str is " hello, world ", then the
string returned should be "hello, world". Only the leading and trailing
spaces have been removed. The remaining spaces are unchanged.
If there are no trailing spaces, just return the string as is
(an empty string falls into this category).
Use leftTrim and rightTrim to implement this static method. |
| static bool isBlankLine( const std::string & str ) ; |
| Returns true is the line consists of zero or more blanks. Returns false if there is a non-blank character. |
| static std::string stripComment( const std::string & str, const std::string & commentStr ) ; |
|
Finds the leftmost occurrence of commentStr (that's not surrounded
by single or double quotes). Removes the commentStr to
end of string. For example, if the string were "add $r1, $r2, $r3 # adds",
and the comment string were "#", then the retuen value would be:
"add $r1, $r2, $r3 ".
If there is no comment string, just return the string itself. |
| static string getLeftToken( string & str, const string & delimiter ) ; |
|
A delimiter is a separator. Often spaces are used to
separate words. Sometimes commas are used. You might have
colons, slashes, etc. I will also assume you might use
more than one character as a delimiter. For example, you might
use double colons.
The behavior of this method depends on what the delimiter is. If the delimiter is a single space or multiple spaces, then you should get the leftmost token by:
For example, if you have " cat dog ", then once the call is made to getLeftToken(), you should return "cat", and str, the parameter will be " dog ". This behaves very much like "cin" when reading in a string. If the string is empty or contains only blanks, then return the empty string. If the delimiter is anything else, you should search for the first delimiter not enclosed in single or double quotes. You should read up to that delimiter and stop. Should the delimiter not exist, then just return the token. For example, suppose the delimiter is ",". You have " ok "one, two, three" , four". The left (first) token should be "ok "one, two, three" ", and str should be " four". The delimiter is not kept. Also, "escaped" delimiters don't count. For example, if you have the string "paper \, scissors, rock", then the first (left) token is "paper , scissors" and str is "rock". Notice that escaped delimiters are replaced by the actual delimiter. Backslashes can also be delimiters, which means they can also be escaped. |
| static string getRightToken( string & str, const string & delimiter ) ; |
| Similar to getLeftToken() except your get the token from the right side. Basically, process the string right to left, instead of left to right. |
| static int findRealDelimiter( const std::string & str, const std::string & delimiter ) ; |
| This is useful helper function. This returns the index of the first character of a delimiter that is not surrounded by single or double quotes. If there is no such delimiter, return -1. |
| static vector<string> split( const std::string & str, const std::string & delimiter, bool trimToken = true ) ; |
|
Calls firstToken() repeatedly to break a string into tokens.
As each token is pulled out, if trimToken is true, call
trim() on the token to remove leading and trailing blanks.
Be careful. If the string was "h,," you would have 3 tokens. "h", "", and "". For delimiters that are not blanks, the number of tokens should be one more than the number of "real" delimiters. |
| static std::string removeEnd( const std::string & str, char left, char right ) ; |
|
Suppose you had a string like " [ cat ]", and you want to remove
brackets. There may be leading or trailing spaces. So, specify
left as '[' and right as ']'. This returns back
" cat ".
More technically, it removes leading white space and left, should it exist. It removes trailing white space and right should it exist. Should left not exist, then this function does a left trim. Should the right one not exist, then it does a right trim (see the two methods above). Thus, if the string were " cat dog " and left and right were '[' and ']' respectively, then the result would be "cat dog". |
Here's an example:
vector<string> tokens = Tokenizer::split( str ) ;
|
See the class syllabus for policies concerning email Last Modified: Wed Mar 13 20:20:09 EST 2002 |
|
|
|
|
|