cs132.markov
Class MarkovText
java.lang.Object
cs132.markov.MarkovText
public class MarkovText
- extends java.lang.Object
A MarkovText represents a Markov state transition table
of a specific order that has been trained on some text.
When you create the MarkovText, you specify the order of the Markov model.
You can then train the MarkovModel on any number of text sequences. Having
done this, you can invoke markovIterator to generate an Iterator
that will generate a random sequence of
For example, in a 2nd order Markov state transition table, you should be
able to determine, for any two words/Strings, the DenseBag of the
words/Strings that follow that sequence of two words/Strings in the training
sequence.
It is strongly recommended that you use a Map from List to DenseBag
to represent the Markov transitions. As part of your design, you have to
figure out how to handle the start and end of training sequences.
One possibility is to use the empty string (which never occurs in training
sequences) as a special marker. For example, using a first order markov model
trained on the sequence [ "a", "a", "b" ], we might generate the map:
[""] = {"a" = 1}
["a"] = {"a" = 1, "b" = 1}
["b"] = {"" = 1}
For a second order markov chain trained on the sequence [ "a", "a", "a", "b" ],
we would generate the map:
["", ""] = {"a" = 1}
["", "a"] = {"a" = 1}
["a", "a"] = {"a" = 1, "b" = 1}
["a", "b"] = {"" = 1}
|
Method Summary |
DenseBag<java.lang.String> |
get(java.util.List<java.lang.String> probe)
Given a list of words, return the dense bag of the words that occur
immediately after occurrences of that list in the training sequence. |
static void |
main(java.lang.String[] args)
This method just shows a sample way to use MarkovText to generate randomized text. |
java.util.Iterator<java.lang.String> |
markovIterator(java.util.Random r)
Given a Markov transition table of a specified order, return an Iterator
that will enumerate elements according to the Markov transition diagram. |
java.lang.String |
toString()
Generate a String representation of the the Markov model. |
void |
updateMarkovTransitions(java.util.Iterator<java.lang.String> words)
Update/train a Markov state transition table on the provided Iterator. |
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
MarkovText
public MarkovText(int o)
get
public DenseBag<java.lang.String> get(java.util.List<java.lang.String> probe)
- Given a list of words, return the dense bag of the words that occur
immediately after occurrences of that list in the training sequence.
Note: this method is for testing purposes and has limitations that
prevent it being generally used. For example, you can't use this method
to determine what word would be at the beginning of a generated sequence.
- Parameters:
probe - list of words
- Returns:
- dense bag of strings that occur immediately after any
occurrence of the probe in the training sequence.
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- This method just shows a sample way to use MarkovText to generate randomized text.
This method will not be tested.
- Throws:
java.lang.Exception
markovIterator
public java.util.Iterator<java.lang.String> markovIterator(java.util.Random r)
- Given a Markov transition table of a specified order, return an Iterator
that will enumerate elements according to the Markov transition diagram.
The iterator must be lazy; it must not actually generate a list of words
from the Markov transition table, and then just return an iterator over
this list.
toString
public java.lang.String toString()
- Generate a String representation of the the Markov model. This will be
useful for your own debugging purposes, but will not be tested.
- Overrides:
toString in class java.lang.Object
updateMarkovTransitions
public void updateMarkovTransitions(java.util.Iterator<java.lang.String> words)
- Update/train a Markov state transition table on the provided Iterator.
You may assume that all the values generated by the Iterator are non-null
and have a length > 0.
For each word w generated by the iterator, the method should:
- Create a List consisting of the (order) words before w (e.g., if
order=2, create a list consisting of the two words before w).
- Look in the transitionTable to find the Bag corresponding to that
list.
- If no bag exists, add an empty bag. Be sure that when updating the
transitionTable, you use as a key a List that will never change.
- Add w to the bag
You have to handle two special cases: the first and last symbols in the
list. You have freedom in how you handle it, but you must, for example,
be able to determine the DenseBag of words/Strings that
should start a Markov generated text sequence, and determine when a
Markov generated sequence should end.
- Parameters:
words - -
sequence of words to use to update