0:07

This lecture is about using a time series

Â as context to potentially discover causal topics in text.

Â In this lecture, we're going to continue discussing Contextual Text Mining.

Â In particular, we're going to look at the time series as a context for

Â analyzing text, to potentially discover causal topics.

Â As usual, it started with the motivation.

Â In this case, we hope to use text mining to understand a time series.

Â Here, what you are seeing is Dow Jones Industrial Average stock price curves.

Â And you'll see a sudden drop here.

Â Right.

Â So one would be interested knowing what might have caused the stock

Â market to crash.

Â 0:48

Well, if you know the background, and you might be able to figure it out if you

Â look at the time stamp, or there are other data that can help us think about.

Â But the question here is can we get some clues about this

Â from the companion news stream?

Â And we have a lot of news data that generated during that period.

Â 1:08

So if you do that we might actually discover the crash.

Â After it happened, at the time of the September 11 attack.

Â And that's the time when there is a sudden rise of the topic

Â about September 11 happened in news articles.

Â 1:26

Here's another scenario where we want to analyze the Presidential Election.

Â And this is the time series that are from the Presidential Prediction Market.

Â For example, I write a trunk of market would have stocks for each candidate.

Â And if you believe one candidate that will win then you tend to buy the stock for

Â that candidate, causing the price of that candidate to increase.

Â So, that's a nice way to actual do survey of people's opinions about

Â these candidates.

Â 2:10

Or in a social science study, you might be interested in knowing what method

Â in this election, what issues really matter to people.

Â Now again in this case,

Â we can look at the companion news stream and ask for the question.

Â Are there any clues in the news stream that might provide insight about this?

Â So for example, we might discover the mention of tax cut

Â 2:35

has been increasing since that point.

Â So maybe, that's related to the drop of the price.

Â So all these cases are special cases of a general problem of joint

Â analysis of text and a time series data to discover causal topics.

Â The input in this case is time series plus

Â text data that are produced in the same time period, the companion text stream.

Â 3:28

Now we call these topics Causal Topics.

Â Of course, they're not, strictly speaking, causal topics.

Â We are never going to be able to verify whether they are causal, or

Â there's a true causal relationship here.

Â That's why we put causal in quotation marks.

Â But at least they are correlating topics that might potentially

Â explain the cause and

Â humans can certainly further analyze such topics to understand the issue better.

Â 3:59

And the output would contain topics just like in topic modeling.

Â But we hope that these topics are not just the regular topics with.

Â These topics certainly don't have to explain the data of the best in text, but

Â rather they have to explain the data in the text.

Â Meaning that they have to reprehend the meaningful topics in text.

Â Cement but also more importantly,

Â they should be correlated with external hand series that's given as a context.

Â So to understand how we solve this problem, let's first adjust to

Â solve the problem with reactive topic model, for example PRSA.

Â And we can apply this to text stream and

Â with some extension like a CPRSA or Contextual PRSA.

Â Then we can discover these topics in the correlation and

Â also discover their coverage over time.

Â 5:05

But this approach is not going to be very good.

Â Why? Because

Â awareness pictured to the topics is that they will discover by PRSA or LDA.

Â And that means the choice of topics will be very limited.

Â And we know these models try to maximize the likelihood of the text data.

Â So those topics tend to be the major topics that explain the text data well.

Â aAnd they are not necessarily correlated with time series.

Â Even if we get the best one, the most correlated topics might still not be so

Â 5:37

So here in this work site here, a better approach was proposed.

Â And this approach is called Iterative Causal Topic Modeling.

Â The idea is to do an iterative adjustment of topic,

Â discovered by topic models using time series to induce a product.

Â 6:09

And then we're going to use external time series to assess

Â which topic is more causally related or correlated with the external time series.

Â So we have something that rank them.

Â And we might think that topic one and topic four are more correlated and

Â topic two and topic three are not.

Â Now we could have stopped here and

Â that would be just like what the simple approached that I talked about earlier

Â then we can get to these topics and call them causal topics.

Â But as I also explained that these topics are unlikely very good

Â because they are general topics that explain the whole text connection.

Â They are not necessary.

Â The best topics are correlated with our time series.

Â 6:51

So what we can do in this approach is to first zoom into word level and

Â we can look into each word and the top ranked word listed for each topic.

Â Let's say we take Topic 1 as the target examined.

Â We know Topic 1 is correlated with the time series.

Â Or is at least the best that we could get from this set of topics so far.

Â 7:23

And if the topic is correlated with the Time Series,

Â there must be some words that are highly correlated with the Time Series.

Â So here, for example, we might discover W1 and W3 are positively

Â correlated with Time Series, but W2 and W4 are negatively correlated.

Â 7:41

So, as a topic, and it's not good to mix these words with different correlations.

Â So we can then for the separate of these words.

Â We are going to get all the red words that indicate positive correlations.

Â W1 and W3. And

Â we're going to also get another sub topic.

Â 8:07

Now, these subtopics, or these variations of topics, based on the correlation

Â analysis, are topics that are still quite related to the original topic, Topic 1.

Â But they are already deviating,

Â because of the use of time series information for bias selection of words.

Â So then in some sense, well we should expect so, some sense

Â more correlated with the time series than the original Topic 1.

Â Because the Topic 1 has mixed words, here we separate them.

Â 8:46

can be expected to be better coherent in this time series.

Â However, they may not be so coherent as it mention.

Â So the idea here is to go back to topic model by using these

Â each as a prior to further guide the topic modeling.

Â And that's to say we ask our topic models now discover topics that

Â are very similar to each of these two subtopics.

Â And this will cause a bias toward more correlate to the topics was a time series.

Â Of course then we can apply topic models to get another generation of topics.

Â And that can be further ran to the base of the time series to set after the highly

Â correlated topics.

Â And then we can further analyze the components at work in the topic and

Â then try to analyze.word level correlation.

Â And then get the even more correlated subtopics that can be

Â further fed into the process as prior to drive the topic of model discovery.

Â 9:46

So this whole process is just a heuristic way of optimizing causality and

Â coherence, and that's our ultimate goal.

Â Right?

Â So here you see the pure topic models will be very good at

Â maximizing topic coherence, the topics will be all meaningful.

Â 10:02

If we only use causality test, or correlation measure,

Â then we might get a set words that are strongly correlate with time series,

Â but they may not necessarily mean anything.

Â It might not be cementric connected.

Â So, that would be at the other extreme, on the top.

Â 10:21

Now, the ideal is to get the causal topic that's scored high,

Â both in topic coherence and also causal relation.

Â In this approach,

Â it can be regarded as an alternate way to maximize both sine engines.

Â So when we apply the topic models we're maximizing the coherence.

Â But when we decompose the topic model words into sets

Â of words that are very strong correlated with the time series.

Â We select the most strongly correlated words with the time series.

Â We are pushing the model back to the causal

Â dimension to make it better in causal scoring.

Â And then, when we apply the selected words as a prior

Â to guide a topic modeling, we again go back to optimize the coherence.

Â Because topic models, we ensure the next generation of topics to be coherent and

Â we can iterate when they're optimized in this way as shown on this picture.

Â 11:20

So the only I think a component that you haven't seen such a framework is how

Â to measure the causality.

Â Because the rest is just talking more on.

Â So let's have a little bit of discussion of that.

Â So here we show that.

Â And let's say we have a topic about government response here.

Â And then we just talking more of we can get coverage of the topic over time.

Â So, we have a time series, X sub t.

Â 11:43

Now, we also have, are give a time series that represents external information.

Â It's a non text time series, Y sub t.

Â It's the stock prices.

Â Now the the question here is does Xt cause Yt?

Â 12:08

There are many measures that we can use in this framework.

Â For example, pairs in correlation is a common use measure.

Â And we got to consider time lag here so

Â that we can try to capture causal relation.

Â Using somewhat past data and using the data in the past

Â 12:26

to try to correlate with the data on

Â points of y that represents the future, for example.

Â And by introducing such lag, we can hopefully capture some causal relation by

Â even using correlation measures like person correlation.

Â 12:52

And the idea of this test is actually quite simple.

Â Basically you're going to have all the regressive model to

Â use the history information of Y to predict itself.

Â And this is the best we could without any other information.

Â So we're going to build such a model.

Â And then we're going to add some history information of X into such model.

Â To see if we can improve the prediction of Y.

Â If we can do that with a statistically significant difference.

Â Then we just say X has some causal inference on Y,

Â or otherwise it wouldn't have causal improvement of prediction of Y.

Â 13:32

If, on the other hand, the difference is insignificant and

Â that would mean X does not really have a cause or relation why.

Â So that's the basic idea.

Â Now, we don't have time to explain this in detail so you could read, but

Â you would read at this cited reference here to know more about this measure.

Â It's a very convenient used measure.

Â Has many applications.

Â 13:55

So next, let's look at some simple results generated by this approach.

Â And here the data is the New York Times and

Â in the time period of June 2000 through December of 2011.

Â And here the time series we used is stock prices of two companies.

Â American Airlines and Apple and

Â the goal is to see if we inject the sum time series contest,

Â whether we can actually get topics that are wise for the time series.

Â Imagine if we don't use any input, we don't use any context.

Â Then the topics from New York times discovered by PRSA would be

Â just general topics that people talk about in news.

Â All right. Those major topics in the news event.

Â 14:41

But here you see these topics are indeed biased toward each time series.

Â And particularly if you look at the underlined words here

Â in the American Airlines result, and you see airlines,

Â airport, air, united trade, or terrorism, etc.

Â So it clearly has topics that are more correlated with the external time series.

Â On the right side,

Â you see that some of the topics are clearly related to Apple, right.

Â So you can see computer, technology, software, internet, com, web, etc.

Â So that just means the time series

Â has effectively served as a context to bias the discovery of topics.

Â From another perspective,

Â these results help us on what people have talked about in each case.

Â So not just the people, what people have talked about,

Â but what are some topics that might be correlated with their stock prices.

Â And so these topics can serve as a starting point for

Â people to further look into issues and you'll find the true causal relations.

Â Here are some other results from analyzing

Â Presidential Election time series.

Â The time series data here is from Iowa Electronic market.

Â And that's a prediction market.

Â And the data is the same.

Â New York Times from May 2000 to October 2000.

Â That's for 2000 presidential campaign election.

Â Now, what you see here

Â are the top three words in significant topics from New York Times.

Â 16:21

And if you look at these topics, and they are indeed quite related to the campaign.

Â Actually the issues are very much related to

Â the important issues of this presidential election.

Â Now here I should mention that the text data has been filtered by using

Â only the articles that mention these candidate names.

Â 16:53

But the results here clearly show that the approach can uncover some

Â important issues in that presidential election.

Â So tax cut, oil energy, abortion and gun control are all known

Â to be important issues in that presidential election.

Â And that was supported by some literature in political science.

Â 17:35

So there are two suggested readings here.

Â One is the paper about this iterative topic modeling with time series feedback.

Â Where you can find more details about how this approach works.

Â And the second one is reading about Granger Casuality text.

Â 17:55

So in the end, let's summarize the discussion of Text-based Prediction.

Â Now, Text-based prediction is generally very useful for

Â big data applications that involve text.

Â Because they can help us inform new knowledge about the world.

Â And the knowledge can go beyond what's discussed in the text.

Â 18:28

Text data is often combined with non-text data for prediction.

Â because, for this purpose, the prediction purpose,

Â we generally would like to combine non-text data and text data together,

Â as much cruel as possible for prediction.

Â And so as a result during the analysis of text and

Â non-text is very necessary and it's also very useful.

Â Now when we analyze text data together with non-text data,

Â we can see they can help each other.

Â So non-text data, provide a context for mining text data, and

Â we discussed a number of techniques for contextual text mining.

Â And on the other hand, a text data can also help interpret

Â patterns discovered from non-text data, and this is called a pattern annotation.

Â