next up previous
Next: Campaign Trip Planning Up: UMD HPC98 Questions Previous: ASCII Art

String Similarity

Finding similarities in character strings is an important problem in text processing and data mining. It has applications in genetics research as well, since strands of DNA can be expressed as very long strings of the characters C, G, T, and A.

A string s is said to be a subsequence of string t if the characters of s appear in order within t, but possibly with gaps between occurrences of each character. For example, ``COAT'' is a subsequence of ``ACROBATS'' since these characters occur in the same order within this string ``ARBS''. ``COBRA'' is not a subsequence of ``ACROBATS'', because even though all the characters appear, they do not appear in the proper order, and ``COBBS'' is not a subsequence because a character cannot be used twice.

To make this formal, break s and t into their individual characters as: tex2html_wrap_inline372 and let tex2html_wrap_inline374 . We define s to be a subsequence of t if there exists a sequence of integers tex2html_wrap_inline380 such that for tex2html_wrap_inline382 , we have tex2html_wrap_inline384 .

One common way to measure the similarity of two character strings is to determine the length of their longest matching subsequence. For example, the sequences ``ABRACADABRA'' and ``YABBADABBADOO'' share the subsequence ``ABADABA'', whose length is 7.


For this problem you will input two character strings and output their longest matching subsequence and its length. In some cases there may be more than one longest subsequence, in which case you may output any of them. (For example, given the strings ``ABCD'' and ``ACBD'' your program could output either ``ABD'' or ``ACD''.)

Input format

The input consists of two lines, each containing a string. Each string will contain from 0 to at most 15 characters.

Output format

The output consists of four lines. On the first and second lines, output the two input strings after the headers ``First string: '' and ``Second string: '', respectively. On the third line output ``Similarity: '' and the length of the longest matching subsequence. On the second line output ``LMS: '' and the longest matching subsequence.





First string: ABRACADABRA
Second string: YABBADABBADOO
Similarity:  7

Test data used in judging

Input 1 Output 1
Input 2 Output 2
Input 3 Output 3
Input 4 Output 4

Our solution

  • Slow
  • Fast

    next up previous
    Next: Campaign Trip Planning Up: UMD HPC98 Questions Previous: ASCII Art

    Chau-Wen Tseng
    Tue Mar 24 16:22:12 EST 1998