Reading modes

1. Quick skim to glean the basic hypothesis of the article/text
2. Careful read to check the logical consistency
3. Extra careful read to see if it’s well-writter(usually, a function of brevity and coherence)
4. Editor mode.. look at spells, grammar, punctuation.. actually read out aloud etc..

I mostly indulge 1 or 2. Am trying to reduce the mode1 reading and increase mode 3 reading.
Let’s see.

Chennai Impressions — part II

has been a big revelation. Am seeing things in a very different light. More
cynical, more objective. Today some propaganda playing some song here.
Few observations:
Rheoteric involves
1. Presenting correlation as causality
2. Overwrought metaphors
3. Meme/idea/thought Repetition

While the ADMK, seems to take a more communist/socialist approach, the DMK seems to take the “Tamil”, self-respect, authoritarianism approach.

Generating a Plain Text Corpus from Wikipedia

After the Deadline

AtD *thrives* on data and one of the best places for a variety of data is Wikipedia. This post describes how to generate a plain text corpus from a complete Wikipedia dump. This process is a modification of Extracting Text from Wikipedia by Evan Jones.

Evan’s post shows how to extract the top articles from the English Wikipedia and make a plain text file. Here I’ll show how to extract all articles from a Wikipedia dump with two helpful constraints. Each step should:

  • finish before I’m old  enough to collect social security
  • tolerate errors and run to completion without my intervention

Today, we’re going to do the French Wikipedia. I’m working on multi-lingual AtD and French seems like a fun language to go with. Our systems guy, Stephane speaks French. That’s as good of a reason as any.

Step 1: Download the Wikipedia Extractors Toolkit

Evan made available a…

View original post 505 more words

S/w engineering disease

So much for the case for fixing bugs as early as possible, testing as much as possible before release. and TDD being a tool to help with that….

Though the point of having a TestSuite is that it reduces the amount of context, you have to upload into your brain, when you modify/add functionality to old code. Giving one look at the test cases will give you a better idea than reading through the code to figure out what changes will affect what related features. Test-Driven is one extreme form of doing it and is not recommended for everyone.

Runner’s high is for real

Holy zarquon.. runner’s high is for real… i.e to say it’s not an over-wrought metaphor. The first thing to go when i drink, is my control over my pronounciation. I either jumble it(syllables,words, you name it) up. Not much slurring but just spoonerism. Anyway, today after some quick rounds, i was thinking and was recalling a quote in the context of that thought.. I recalled the right quote, but i was reordering the words and it took me 2-3 tries to get the order right.. Note, all of this just recalling(i.e: verbalizing only in the mind/brain and not actually vocalizing..)