Archive for the 'Super Computing' Category
Where the heck I have been???

For those of you who blame me for not staying in touch for past few weeks, sorry! I have been doing the below mentioned activities and have been out of touch with the humanity.
1. Working on my research on “Name Discrimination” for the Web People Search (WePS)  WePS task corpus.

2. Designing and developing a small POC for clustered processing of above data.

3. For the task 2 designing a small 3 machine cluster of Ubuntu virtual machines on my desktop using VMWare.

Now, that is something that has taken most of my spring break and will probably take some more time to come to reality. Below is my progress on each of the item.

1. WePS task: – I have nearly completed the program to covert the data from the corpus file structure (XML and HTMLs) to the  plain text and then finally in to SenseEval-2 format xml files to be clustered by SenseClusters software.

2.  Now what I have been thinking about is that each of these files are read sequentially by my converter program and then converted in to a SensEval-2 xml. All the file conversions are independent and hence can be converted independently in parallel. Also for the task of clustering, as it exists now we have 79 instances of names to be discriminated. Each name is represented by a xml file. Hence, all these tasks for one name are atomic and independent of each other and in such situation I wish to make this execution parallel too. I am not exploiting functionality level parallelism but for now I do wish to exploit the data parallelism that it exists now.

3 . For doing the parallel processing I am trying to setup my own cluster of Virtual Machines, this will give me a hands on creating Cluster, be a cheap test bed for my parallel programs, and something to play around with for a few days. I have created 3 identical virtual machines with SenseCluster Installation. The Machines are Named Alang, Madan, Kulang. Named after 3 forts in Maharashta, India. I am in a process of creating the cluster of these virtual machines. Each of them having RAM of 512 MB and HDD of 8 Gigs along with a dual core AMD Athalon CPU. The RAM limitation is due to the limit of my physical memory available, I have only 3 Gigs with me :( … I will update all on this front soon (Once I set up the cluster, which might not be before weekend and also I have an exam comming up so this is kind of on back burner…) till then thats it from my side…

Wish me Luck! Keep your messages flowing in… and I will try to be in touch in future… ;-)