Insight porn

I had recently unsubscribed(about a month ago) from ribbon farm’s rss feed(on my google reader). I had done it in an effort cut my read posts on reader/month ~50 from 100. But i did not really think a lot about why. i know i spend reading the ribbonfarm posts just that little extra time and/or attention. I also know, what i learn from ribbonfarm is valuable. But the number of posts per month convinced me to drop it and read  it instead by visiting on the browser. Ofcourse, due to the historical old posts browsing, it was already on my block list when i am in ‘work’ mode.

I hadn’t considered the tradeoff (attention vs lesson/learning value.) till i saw this post today. He refers to another blog , which talks about insight porn type of blogs. I managed not to click on it and go reading around yay… Anyway, given i have been reading ribbonfarm and do tend to write a blog post** on top of a specific ribbonfarm post and feel a very shallow/temporary rush of work done.  Note that it was nowhere close to what i get when i read up some math and try to sum it up or just take notes/thoughts on it.

Anyway, i realized in a sense venkat is right. He’s generating insights that are useful, but not  change your world type deep.  Atleast not any more. I won’t pretend to be following him from back in 2008 to know this, but i do know i kinda have some idea or the other of his posts’ direction with a little skimming. This is probably an effect me going and digging around his archived posts reading up for a couple of hours at a time . Anyway, these are porn in the sense, they have surprising perspectives and interesting metaphors sometimes, but sometimes they are just one/two-level logical connections from some well-known principles. (A side effect, being you become a lazier thinker.* )  I guess that’s where the 20-80 rule/problem comes up. Infact, i originally thought, i would go and make a list of his blog posts that are not insight porn and those that are give a feedback, before i realize that list is going to be different for different sets/types of people.

Besides, i don’t think there’s any reliable way (that i can suggest ) for him to measure(nay grok) the distribution to make the blog better.

Takeaways:

1. i now vow to write less posts surrounding/developing around ribbonfarm posts and definitely not new posts.

2. I vow to read ribbonfarm on intermittent + serendipity seeking basis. i now have a list of sites for serendipity and lesswrong is moving up in that list.

3. Forming a list of stuff to write on a habit basis( some mix of math + open-source s/w) — not yet ready.

4. Progress enough to move john d cooks’ blog from the google reader subscription to serendipity reading list. instead get something like the p=np blog as a subscription.

 

* — I guess we all optimize for some kinds of think and become lazy thinkers in some area or another realistically.

**– One of my own example is this. it doesn’t really qualify as insight porn(it falls short of offering reasonably useful insights) at least not  sophisticated, but very crude. but definitely qualifies as traffic generator posts..

 

UPDATE 1: ironically, enough this post itself seems to have become a bit of clicks attractor..

UPDATE 2: ok, i realized a couple of things, the site traffic monitor (inherent in wordpress and google analytics) both, perhaps useful if checked once a month or so are a pain if checked every day(which is what i had started doing). the problem with checking every day or once in two days is it is very easy to settle for the local maxima of clicks coming in from backtrack links on existing popular blogs. my core faults so far.

Python list vs dictionary.

Was talking to a colleague(.Net developer) and ended up lecturing him about how array(list in python) is a specific type of data structure and is a specific type of associated array. Now. the logic goes like this. Associated array(dict in python) is an key-value store. It is a method to store data in the format of a key  mapped to a value. it is usually implemented  with the help of a good hash function.

Anyway, the only constraint being any new insertion has to be of the format key, value , where both the key, value are hashable* values.

Add one more condition that the key values have to be in the increasing order of whole numbers(numbers starting at 0) and you have an array/list.

This discussion/lecture got me thinking about how it would be implemented at the python core language level. I promise to actually check the open-source code base and write a summary later, but for now here’s my thought process/guesses after some pruning over a walk.

1. A list by virtue of having whole numbers for key values will be easier to access. i.e: it can be stored in constant interval  locations in the memory. (I know python being dynamic typed and python lists being capable of storing different type of values in a list, complicates things, but only the implementation. In theory, i can just add a pointer in that memory segment to the real memory where it is stored(you know in case of a huge wall of text that doesn’t fit in the memory intervals.)).   Effect? Accessing a list can be done in Const. Time O(1).**

2. A dictionary since it can have an arbitrary data type as key, cannot be assumed to have const. memory spaces. but then we have a hash function. i.e , we pass our key to a function that is guaranteed to return a unique hash value for any  unique key.  Now the lookup becomes two fold. First

a, a Hashing to get the hash key,

b, Search the table for an entry with the same key as the hashed key value.

Now what is the Big O time for this. My first thought is well it depends on

a. the hashing function implementation

b.  The table size or rather the hashed value size w.r.to  the dictionary size.

Anyway, this reminded me of an older post i had made and the excursions i had made into the cpython source at that time. And i clearly remember some comment from that Objects/dictobject.c/h file about the hashing function being special enough to make O(1) look up. Now i did not really get it at that time and will need to check the code + comment again in context. but the basic reasoning as i remember is by avoiding most of the outlier cases and assuming most simplest/popularly used distribution of the keys, they can simplify the hashing function to O(1).  Will update more some time.

 

 

** — Turns out not exactly const time, but invariable time with respect to the number of elements in the list.  In cases of pointers, there will be variation in time depending on the size of the element stored, but the first lookup is a simple const. time lookup table.

* — by our hash function. but generally, not a file stream, socket handler, port handler , device file,  IPC pipe etc…