The data stream have symbols a, b,

c. This all my symbol alphabet essentially has only four symbols in it,

just four to simplify the example, right?

I am also given a query pattern.

The query pattern is abbcc.

Essentially, what I'm interested in finding

all the occurrences of abbcc on my data sequence,

and I want to do this efficiently.

The main way of implementing this is straightforward.

As at the brute force is essentially,

take the sequence of abccc,

five character sequence, right?

And align this pattern with all possible

five character subsequences that we

have on the data sequence, right?

If I do that, eventually,

I will align with abbcc,

and I will find one match.

If I continue my alignment process,

eventually, I will say align this with the next instance of abccc.

I will have my second match.

I will continue with this process,

and I will not find any other alignment.

I will say, "Well,

in this long data sequence,

there are two instances of the query pattern."

Note that this is very simple to implement, very simple process.

However, it's very costly.

For example, if I am given a data sequence of length N,

and if I'm given a query a sequence of length M or a query pattern of length M,

the cost of the search is going to be N times M. In this example,

my data sequence has 60 characters or 60 symbols.

My query pattern has five characters or five symbols or five events.

The cost of this will be a total of 60 times five,

300 comparisons, which is large.

Now, in this example, maybe it's not too large,

but if my data was longer,

and if my pattern was much longer,

it is easy to see that this number can be very large.

So obviously, if I am trying to support an exploratory search,

exploratory access to data,

this is not going to be scalable.

I need something much faster.

If I'm given a pattern,

and if I'm given a long sequence,

I want to be able to find the pattern in the data much faster.

So the question is that, how do I achieve this? That is the question.

So, in the next few slides, essentially,

we will learn a few data structures that

are available to support the subsequence or pattern search efficiently.

So let's see what those are.

Well, let's go back to our try data structure.

Now, as we said,

try is a very efficient data structure,

and it works very well,

but it's only applicable for prefixes.