Monday, October 30, 2006

I (heart) Python


So I have finally decided to sit down and try to learn some of this programming mumbo-jumbo. After glancing over a couple of the beginner's guides, I have settled on Guido Van Rossum's Python Tutorial. I find that his guide is much clearer on indenting than some of the others, but the one thing that I would really appreciate is a syntax guide. Even if it was just a list of the most basic functions would be helpful, as it would spare me from constantly growling at the computer.

Wednesday, October 25, 2006

Context-hunting computers?


During an in-class discussion the other day, a question was posited about how a computer program could be made to search a digitized text for its context (or was it subtext?)

This got me thinking about a tool that can be easily accessed on the internet: the linguist's search engine. Basically, it is a program that can break down a phrase/sentence into its syntactic parts. So, if you enter the statement: "The Chinese community was blamed for the 2003 SARS pandemic", the program will identify that the subject of the statment is the "chinese community".

What I think would be pretty awesome is a tool that can take a search query (like the one stated above) and first identify the syntactic role of the various word-strings. Following this, it could create an ad hoc storage base not only of the separate word-strings, but also for word-strings semantically related (re:synonyms (probably the first three entries of an on-line dictionary)) to those data pieces. This way, you could have a program that can perform a broad non-specific search for a particular phrase.

I had always wondered if there was a way to graphically represent people's reactions following a particular event in the news. With a program like this, I believe that it would be possible to find out how many people thought, e.g., that the Michigan Militia (re: T. McVeigh) was responsible for the attacks of Sep 11/2001, before al-qaeda released a video taking responsability.

I'm not too sure how to make a program like this work, or even if it is feasible. BUT, in theory it seems sort of cool.

Any thoughts?

Saturday, October 21, 2006

Faking it on the Web


The proliferation of internet "digital hoaxes" brings several things to mind:

First: the urban legends

These popular pictures spread as attachments that were forwarded a billion times across the web. Some of them were just obviously ridiculuous, such as the "Bert is Evil" campaign. However, there were others that seemed at least slightly plausible. They were often grotesque images with an accompanying explanation. I’m pretty sure that everyone got at least one of them: the “eye-infecting dust-worms”, “breast infection”, and “contaminated sushi” seemed to be the most famous. In the instance of the breast infection, a doctored photograph did the trick, but the other two used actual pictures. The stories describing them were invented.

I believe that our fascination with this sort of spectacle can be connected to the 18th century fascination with cabinets of curiosities. The only difference is that the ‘curiosities’ can now come to us. In the past, people would question the veracity of an exhibit simply because of what it was presented with. Now that these curiosities are coming to us via email, it seems that the onus of determining a thing’s “truthiness” depends on the viewer’s own ability to check sources, as well as their ability to be skeptical, or at least agnostic.

Second: Mechanical Objectivity

I think that our visual gullibility can be connected to a cultural acceptance of what Lorraine Datson and Peter Galison have termed “mechanical objectivity”. Basically, with the invention of the camera, people believed that they finally had a way to present reality without worrying about bias in the presentation. Now, this was back in the 19th century: since then we have also learned to be skeptical of photographs. However, we can’t forget that the idiom: “Pictures don’t lie” is based on the assumption that a camera always discloses the truth.

Third: Political manipulation of imagery

Once the public accepted the notion that “pictures don’t lie”, various groups/individuals/institutions have tried to use this belief against us for their own rhetorical ends. During the 2004 election campaign, John Kerry’s picture appeared in an image with “Hanoi Jane” Fonda at an anti-vietnam rally. Now, I do not necessarily believe that this sort of manipulation resulted in Kerry’s defeat…the man did it to himself (or rather, did not do it)…but it is clear that whoever doctored the image did it for propagandist ends.
As an historian, I tend to historicize things. In this case, doctoring photographs for propagandistic ends has been an old tactic, and does not depend on digital technology. The case of Trotsky disappearing from Lenin’s side was accomplished, after all, through analog methods.

Fourth: Coded Objectivity?

The article on Farid’s truth-seeking algorithms makes me think about our own era’s notions of objectivity. If the truth is no longer in the camera (which can be manipulated), does this mean that the truth is in the code?

Friday, October 06, 2006

Spiders crawling all over the web...


JavaWorld.com has a very interesting article on creating web spiders. The article is pretty technical, however, after parcing through it there is alot of useful information to be gleamed. Using a web spider is kind of like using google. However, as I understand it, the information returned by a webspider follows a directed path of links starting from the "root"; whatever website you choose to have the spider start following links from.

At the bottom of the article there is a downloadable Demo program of a spider. Very fun program to play around with. Once I became used to the advantages/disadvantages of using various breadth and depth settings, I was able to return some intereseting results.

For instance, if you set the maximum search depth to "100", this particular program will follow each link until it has travelled 100 links from the root. At that point it will start the "breadth" field of the search, which involves travelling along each of the links found on each website, until it can travel no farther.

So, it seems as though the Demo Spider has two modes: first, the depth mode, it travels along the first link it encounters until it can not travel any farther, or it reaches the maximum number of sites to travel along. Following this, it "backtracks" to each site, and explores any other links available, until there are no more, or it reaches the maximum depth specified.

As a tool, this strikes me as extremely useful since it searches the entire website for you. Rather than try to search through a site map for specific content and links on a website, one can make the spider do it for you! If one wanted to create a database of links on a particular subject, this program performs an exhaustive search. Granted, you have to go through the material yourself. However, there is an added tool in the program to make that task much easier.

You can enter keywords for it to pay attention to, so the program will highlite any webpages that match the criteria that you specify. This is not like a key word search on google as it does not isolate the sites that return the keyword. However, it is useful for "seeing" the shape of the net.