cs132.markov
Class MarkovText

java.lang.Object
  extended by cs132.markov.MarkovText

public class MarkovText
extends java.lang.Object

A MarkovText represents a Markov state transition table of a specific order that has been trained on some text. When you create the MarkovText, you specify the order of the Markov model. You can then train the MarkovModel on any number of text sequences. Having done this, you can invoke markovIterator to generate an Iterator that will generate a random sequence of For example, in a 2nd order Markov state transition table, you should be able to determine, for any two words/Strings, the DenseBag of the words/Strings that follow that sequence of two words/Strings in the training sequence. It is strongly recommended that you use a Map from List to DenseBag to represent the Markov transitions. As part of your design, you have to figure out how to handle the start and end of training sequences. One possibility is to use the empty string (which never occurs in training sequences) as a special marker. For example, using a first order markov model trained on the sequence [ "a", "a", "b" ], we might generate the map:

  [""] = {"a" = 1}
  ["a"] = {"a" = 1, "b" = 1}
  ["b"] = {"" = 1}
 
For a second order markov chain trained on the sequence [ "a", "a", "a", "b" ], we would generate the map:
  ["", ""] = {"a" = 1}
  ["", "a"] = {"a" = 1}
  ["a", "a"] = {"a" = 1, "b" = 1}
  ["a", "b"] = {"" = 1}
 


Constructor Summary
MarkovText(int o)
           
 
Method Summary
 DenseBag<java.lang.String> get(java.util.List<java.lang.String> probe)
          Given a list of words, return the dense bag of the words that occur immediately after occurrences of that list in the training sequence.
static void main(java.lang.String[] args)
          This method just shows a sample way to use MarkovText to generate randomized text.
 java.util.Iterator<java.lang.String> markovIterator(java.util.Random r)
          Given a Markov transition table of a specified order, return an Iterator that will enumerate elements according to the Markov transition diagram.
 java.lang.String toString()
          Generate a String representation of the the Markov model.
 void updateMarkovTransitions(java.util.Iterator<java.lang.String> words)
          Update/train a Markov state transition table on the provided Iterator.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

MarkovText

public MarkovText(int o)
Method Detail

get

public DenseBag<java.lang.String> get(java.util.List<java.lang.String> probe)
Given a list of words, return the dense bag of the words that occur immediately after occurrences of that list in the training sequence. Note: this method is for testing purposes and has limitations that prevent it being generally used. For example, you can't use this method to determine what word would be at the beginning of a generated sequence.

Parameters:
probe - list of words
Returns:
dense bag of strings that occur immediately after any occurrence of the probe in the training sequence.

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
This method just shows a sample way to use MarkovText to generate randomized text. This method will not be tested.

Throws:
java.lang.Exception

markovIterator

public java.util.Iterator<java.lang.String> markovIterator(java.util.Random r)
Given a Markov transition table of a specified order, return an Iterator that will enumerate elements according to the Markov transition diagram. The iterator must be lazy; it must not actually generate a list of words from the Markov transition table, and then just return an iterator over this list.


toString

public java.lang.String toString()
Generate a String representation of the the Markov model. This will be useful for your own debugging purposes, but will not be tested.

Overrides:
toString in class java.lang.Object

updateMarkovTransitions

public void updateMarkovTransitions(java.util.Iterator<java.lang.String> words)
Update/train a Markov state transition table on the provided Iterator. You may assume that all the values generated by the Iterator are non-null and have a length > 0. For each word w generated by the iterator, the method should:
  • Create a List consisting of the (order) words before w (e.g., if order=2, create a list consisting of the two words before w).
  • Look in the transitionTable to find the Bag corresponding to that list.
    • If no bag exists, add an empty bag. Be sure that when updating the transitionTable, you use as a key a List that will never change.
  • Add w to the bag
You have to handle two special cases: the first and last symbols in the list. You have freedom in how you handle it, but you must, for example, be able to determine the DenseBag of words/Strings that should start a Markov generated text sequence, and determine when a Markov generated sequence should end.

Parameters:
words - - sequence of words to use to update