Categories related to this article:
Dec
25
Visualizing Similarity Sets
by Ismail Fahmi | December 25, 2006 |
One interesting topic of the natural language processing field is finding word similarity in a document collection. For example, given a set of news documents (more is better), we can see what names which are similar to the word “Fiat” (the name of a famous car). Can you guess, what are the first seven words which are closely related (similar in some ways) to “Fiat”?
If you click this (open new window), you will get a graph as shown by the following figure:

In the news document collection, Fiat has a close similarity with Volkswagen, Toyota, Open, Nissan, Renault, Honda, and Peugeot. Using intuition, we know that they are the names of popular automobile. I will not explain how this set of similarity information can be extracted from a document collection. You can read my colleague’s paper here. Instead, I will show how can we visualize sets of similar words in an interactive way.
If you click a node, for example Toyota, you will get new words in the graph related to the word Toyota. Click again and again to any nodes interesting for you, an interesting graph will emerge showing how names or words related to each other, as shown by the following figure (click to enlarge).
In the above graph, we can see a path showing how a car company Fiat related to a software giant Microsoft. Using a distributional similarity method, we can get similarity information from documents.
I use TouchGraph to visualize that information. It is an open source Java applet application which generate an animated graph of a set of linked nodes. The second graph above is generated using the following dataset:
<TOUCHGRAPH_LB version=“1.20″>
<NODESET>
<NODE nodeID=“Fiat”>
<NODE_LOCATION visible=“true”/>
<NODE_LABEL label=“Fiat” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ />
<NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Fiat” urlIsLocal=“false” urlIsXML=“true”/>
NODE>
<NODE nodeID=“Nissan”>
<NODE_LOCATION visible=“true”/>
<NODE_LABEL label=“Nissan” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ /> <NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Nissan” urlIsLocal=“true” urlIsXML=“true”/> NODE> <NODE nodeID=“Honda”> <NODE_LOCATION visible=“true”/> <NODE_LABEL label=“Honda” shape=“2″ backColor=“ffffcc” textColor=“000000″ fontSize=“12″ /> <NODE_URL url=“http://odur.let.rug.nl/fahmi/sets/explore.php?id=Honda” urlIsLocal=“true” urlIsXML=“true”/> NODE>…
NODESET> <EDGESET> <EDGE fromID=“Fiat” toID=“Nissan” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Honda” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Opel” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Renault” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Toyota” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Volkswagen” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> <EDGE fromID=“Fiat” toID=“Peugeot” type=“1″ length=“65″ visible=“true” color=“99cc99″ /> EDGESET> <PARAMETERS> <PARAM name=“offsetX” value=“0″/> <PARAM name=“rotateSB” value=“0″/> <PARAM name=“zoomSB” value=“0″/> <PARAM name=“offsetY” value=“0″/> PARAMETERS> TOUCHGRAPH_LB>
TouchGraph can be modified or adapted for new data set, as long as the data contain information about nodes and links between the nodes. I also modify a bit the code to display a network of persons as shown in my other post.
You might be interested with these links:
Comments
Comments are accepted in English, Bahasa Indonesia, or Boso Jowo.
International Events
- ACL 2007 - The Association for Computational Linguistics conference, Prague, June 23rd–30th 2007.
- ESWC 2007 - The 4th European Semantic Web Conference, 3-7th June, Tirol, Austria
- ISWC 2007 - The 6th International Semantic Web Conference, 11-15 November 2007, Busan, Korea
My Delicious - More
- g_suite
- Sliding Top Menu With jQuery - Free Web Resources Everyday - WebResourcesDepot
- QSR NUD*IST VIVO Review
- RAID-1, Part 1
- Step By Step Ubuntu 8.10 (Intrepid Ibex) LAMP Server Setup | Ubuntu Geek
- Curved corner (border-radius) cross browser | HTML Remix
- Indonesian government News Statistics - World Coverage Rating Details
I often get interesting new info from ads. And you?:
Complete list of selected blogs and websites
Please go to the bottom of this page.
Recent Comments
- iris: Iya Is... but I'm now back with the public relations. Sorry...aku nggak tahu kamu udah jawab... regards to your...
- ibnu: masih bingung ma penggunaan WS jika dibanding web biasa, apa penggunaan WS dipakai hanya pada pencarian teks...
- Ismail: Is this Ibu Iris Tutuarima from the library of the Bank of Indonesia? I only know one 'Iris' so far :-)...
- iris: Is..need your help. Do you happen to know an expert in semantic web in Indonesia. Your assistance is highly...
- iris: Hi is..I bet you still remember me. It is interesting to read your thoughts about the shifting paradigm in...
- Wibisono Satrodiwiryo: Thank you for the good article, hopefully someday I will posted my own mashup on that site.
- Ismail Fahmi: Hi Othman, Thanks for 'visiting' me. I also enjoy our meeting at the ESWC. I got a lot of ideas from...
- Othman Tajmouati: Hi Ismail, I really enjoyed our conversation at the ESWC last week in Innsbruck. It was a good conf...
- Purwoko…. berbagi itu indah: [...] http://semweb2.ismailfahmi.org /archives/14#more-14 [...]
- Ismail Fahmi: Thanks, Ranti. I'll do my best. Sometimes it is very difficult to stay away from programming.











