Categories related to this article:




Dec

25

Visualizing Similarity Sets

by Ismail Fahmi | December 25, 2006 |

One interesting topic of the natural language processing field is finding word similarity in a document collection. For example, given a set of news documents (more is better), we can see what names which are similar to the word “Fiat” (the name of a famous car). Can you guess, what are the first seven words which are closely related (similar in some ways) to “Fiat”?

If you click this (open new window), you will get a graph as shown by the following figure:

graph-fiat.gif

In the news document collection, Fiat has a close similarity with Volkswagen, Toyota, Open, Nissan, Renault, Honda, and Peugeot. Using intuition, we know that they are the names of popular automobile. I will not explain how this set of similarity information can be extracted from a document collection. You can read my colleague’s paper here. Instead, I will show how can we visualize sets of similar words in an interactive way.


If you click a node, for example Toyota, you will get new words in the graph related to the word Toyota. Click again and again to any nodes interesting for you, an interesting graph will emerge showing how names or words related to each other, as shown by the following figure (click to enlarge).

graph-setall.gif

In the above graph, we can see a path showing how a car company Fiat related to a software giant Microsoft. Using a distributional similarity method, we can get similarity information from documents.

I use TouchGraph to visualize that information. It is an open source Java applet application which generate an animated graph of a set of linked nodes. The second graph above is generated using the following dataset:

 <TOUCHGRAPH_LB version=“1.20″>
<NODESET>

<NODE nodeID=“Fiat”>
<NODE_LOCATION visible=“true”/>
<NODE_LABEL label=“Fiat” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ />
<NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Fiat” urlIsLocal=“false” urlIsXML=“true”/>
NODE>
<NODE nodeID=“Nissan”>
<NODE_LOCATION visible=“true”/>
<NODE_LABEL label=“Nissan” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ />
<NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Nissan” urlIsLocal=“true” urlIsXML=“true”/>
NODE>
<NODE nodeID=“Honda”>
<NODE_LOCATION visible=“true”/>
<NODE_LABEL label=“Honda” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ />
<NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Honda” urlIsLocal=“true” urlIsXML=“true”/>
NODE>…
NODESET>
<EDGESET>

<EDGE fromID=“Fiat” toID=“Nissan” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Honda” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Opel” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Renault” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Toyota” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Volkswagen” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
<EDGE fromID=“Fiat” toID=“Peugeot” type=“1″ length=“65″ visible=“true” color=“99cc99″ />
EDGESET>
<PARAMETERS>
<PARAM name=“offsetX” value=“0″/>
<PARAM name=“rotateSB” value=“0″/>
<PARAM name=“zoomSB” value=“0″/>
<PARAM name=“offsetY” value=“0″/>
PARAMETERS>
TOUCHGRAPH_LB>

TouchGraph can be modified or adapted for new data set, as long as the data contain information about nodes and links between the nodes. I also modify a bit the code to display a network of persons as shown in my other post.



You might be interested with these links:




Comments

Comments are accepted in English, Bahasa Indonesia, or Boso Jowo.

Name (required)

Email (required)

Website

Speak your mind

Comments are closed.


International Events

My Delicious - More

I often get interesting new info from ads. And you?:



Complete list of selected blogs and websites

Please go to the bottom of this page.

Recent Comments

For Beginners (like me)

Bloggers & Portals

Standard Bodies & Interest Groups

Ontologies

Libraries

News and Journals