# Seven secrets of shiva.

I thought of you when I read this quote from “Seven secrets of Shiva” by DEVDUTT PATTANAIK –

“Culture by its very nature makes room for some practices and some people, and excludes others. Thieves and criminals and ghosts and goblins have no place in culture.”

. Given the choice between the Farm and the Organization, he picked the Organization. I would have too. I have yet to meet a TPS report so onerous I would prefer to be handpicking cotton in Tennessee in August.

# The Cauvery Water debate — opinions

This was inspired by the controversy around cauvery water debate in mid-september 2016.

Before we begin, I’ll set down my biases and priors and  assumptions:

1.    I’m from TN, living in bangalore for about 10 years.
2.    I’m unaware of the actual rain level, agricultural needs, ecological needs and others.
3.   I’m not going to propose a verdict as much as a process/method to deal with the conflicts that don’t depend on politicians or supreme court.
4.  I’ve travelled in TN, to most parts in my youth, and trekked to most parts of Karnataka(speaking broken kannada) in the last 10 years,(obviously not as much as TN) and have a grasp on the cultural/mental attitudes in general.

One reason I’m ruling out political solution is because we live in a representational democracy.  The way the incentives are in that setup are for the politicians to do what gets them the most votes from the biggest part of their demographics. Trying to expect them to talk to politicians from other state and  come to a compromise is hard because on top of representational democracy, we have a  multi-party system.  Which means, there’s scope for local parties to not care about the interest of the other state parties and people. I’ve seen a few national parties taking contradictory stances, based on which state’s division they are making statements from.  In addition to this incentives, this is a situation with  prisoner’s dilemma type dynamics(i.e: to say, if one agent stops co-operating and defects, then the rest are better off doing the same.).  The only rewards for the politicians in this are media-time and vote bank support .

So what I do advocate is a mix of open data and predictive models plus persuasion and media (attention) frenzy that’ll overtake anything like the top-down media stuff the politicians can stir up. It won’t work without both of them, but I have no clue/idea about what will be successful and what will not in the latter case, so will focus majority of the post on the first.

Advocating open data access (water level, population, catchment area, drought area, cultivation area, predicted loss of agricultural area etc….) managed /maintained by a panel of experts, but open to all for debating and opining..

Major points(on the open data front):

1.    Make the data open and easily  available . Here the data will be catchment areas, agricultural need estimates, actual rainfall, water table levels, water distribution wastage/efficiency, sand mining and their effects on water flow, economic impacts of the water shortage(bankruptcies, loss of revenue, loss of investment etc..). (There are some platforms like this and this already in India)*
2.    Create/use a open data science platforms let bloggers and volunteers, modify the models (for estimates) and make blogs/predictions based on the given data, but with different models and parameters. (Some tools can be found here and here)
3. Try to present the models in a way they can be interacted with even by people without programming experience. (The notebook links i provided above need python knowledge to edit, but anything built with this won’t)
4.  Add volunteers to cross-check some of the data, like sand-mining, rain fall level, etc..
5. Publish/collaborate with reporters to inform/write stories around the issue, with the help of models.(something with atleast the level of science journalism.)

Some thoughts(on the media – based persuasion front):

1.  Recruit enough people interested in the exercise of figuring out details about impact of the issue.
2. Make sure you can reach the ones that are currently most likely to indulge in violence(I can only guess at details, but  better targeted marketing strategy is what we need).

O.k: Enough of the idealistic stuff.

1. Will this work? Well the question is too broad. Will it work to bring out the truth.. Ah it can bring us closer to the truth than what  we have.  And more importantly can define/establish a method/platform for us to get closer to data-driven debates and/or arguments.
2. Will it cut the violence/bandhs/property-damage etc? Well that  boils down to the media and marketing front activism or work done. Leaving that persuasion part to politicians with skewing incentives towards gaining votes(from the steady voting population) is the problem now.  So can we have alternative parties (say,  business owners,) trying to use persuasion tactics only to discourage violence? I don’t know, but it seems likely that violence and martyrdom is preferred mostly by politicians and dons, but not the rest.(say media, local business owners, sheep-zens etc…). So this move has lower expected probability of violence.
3. Who will pay for all this effort? Ah.. a very pertinent question. The answer is well, it’s going to be hard to pay the costs of even maintaining a information system, not to mention the cost of collecting the data.. That said, I think the big challenge is in the cost of collecting the data, and finding volunteers(something like this in US) to collect it for free. As for the hosting, building and maintaining an information system, I think there can be a cheap way found.
4.  Is this likely to happen? Haha… no.. not in the next half century or so..
5. Is there a cheaper way? Ah.. Not in the global/community/country level.. But at the individuals(media/politicians/public(aka u and me) ) sense yes, but it’s not really a cheaper way in the cost it inflicts. May be I’m just not creative enough, feel free to propose one, just be careful to include the costs to others around you now and others to come in the future.(aka your children)
6. Why will this work? Well apart from the mythical way of saying “Sunlight is the best disinfectant”, I think this approach is basically an ambiguity- reduction approach, which translates to breaking down of status illegibility. (One reason no politician is likely to support this idea.) Status illegibility is the foundation of socio-political machinations and it applies to modern day state politics. So this will raise the probability of something close to a non-violent solution.
• — I haven’t checked whether these data-sets  are already openly available, but I doubt they are and even if they are, some of the data are estimates, and we would need the models that made the estimates too to be public.

UPDATE: A few weeks after this I looked up on the google maps, the path followed by cauvery from origin to it’s end at the sea, and realized, I’ve actually visited more of the places it flows through in Karnataka and a lot fewer in TamilNadu. But that doesn’t change my stance/bias on misuse/abuse of sand mining and lake resources as housing projects in TN as that’s a broader , pervasive and pertinent issue.

UPDATE-1: A few months after writing this there was a public announcement, which if you read close enough is a typical persuation-negotiation move, with a specific action(and strong concession, and right now) demanded from the opponent, in exchange for a vague, under-specified promise in the future. This whole thing was on the news, is more support for my thesis that the incentives for politicians are skewed too much towards PR.

UPDATE-2:  Some platforms for hosting data, models and code do exist as below(although with different focus):

so the question of collecting, cleaning, verifying and updating data is left.Also here’s a quora answer on the challenges of bootstrapping a data science team, which will be needed for this.

# Regularization

Based on a small post found here.

One of the standard problems in ML with meta modelling algorithms(Algorithms that run multiple statistical models over given data and identifies the best fitting model. For ex: randomforest  or the rarely practical genetic algorithm) ,  is that they might favour overly complex models that over fit the given training data but on live/test data perform poorly.

The way these meta modelling algorithms work is they have a objective function(usually the RMS of error of the stats/sub model with the data)  they pick the model based on.(i.e: whichever model yields lowest value of the objective function).  So we can just add a complexity penalty(one obvious idea is the rank of the polynomial that model uses to fit, but how does that work for comparing with exponential functions?)  and the objective function suddenly becomes RMS(Error) + Complexity_penalty(model).

Now depending on the right choice of Error function and Complexity penalty this can find models that may perform less than more complex models on the training data, but can perform better in the live model scenario.

The idea of complexity penalty itself is not new, I don’t dare say ML borrowed it from scientific experimentation methods or some thing but the idea that the more complex a theory or a model it should be penalized over a simpler theory or model is very old. Here’s a better written post on it.

# Statistical moments —

Inspired by this blog from paypal team. Moment is a physics concept(or atleast I encountered it first in physics, but it looks it has been generalized in math to apply to other fields).

If you followed that wikipedia math link above, you’ll know the formula for moment is
$\mu_n = \int\limits_{-\infty}^{+\infty} (x-c)^n f(x)\ dx\$
where x — the value of the variable
n — order of the moment (aka nth moment, we’ll get to that shortly)
c — center or value around which to calculate the moment.

However if you look at a few other pages and links they ignore that part c.. and of course use the summation symbol.**

The reason they don’t put up ‘c’ there is they assume moment around the value 0. As we’ll see below this is well and good in some cases, but not always.

The other part n- order of the moment is an interesting concept. It’s just raising the value to nth power. To begin with if n is even the the negative sign caused by differences goes away. So it’s all a summary and becomes a monotonically increasing function.

I usually would argue that the ‘c’ would be the measure of central tendency like mean/median/mode and a sign of fat-tailed/thin-tailed distributions is that the moments will be different if you choose a different c and the different moments change wildly.

The statsblog I linked above mentions something different.

Higher-order terms(above the 4th) are difficult to estimate and equally difficult to describe in layman’s terms. You’re unlikely to come across any of them in elementary stats. For example, the 5th order is a measure of the relative importance of tails versus center (mode, shoulders) in causing skew. For example, a high 5th means there is a heavy tail with little mode movement and a low 5th means there is more change in the shoulders.

Hmm.. wonder how or why? I can’t figure out how it can be an indication of fat-tails(referred by the phrase importance of tails in the quote above) with the formula they are using. i.e: when the formula doesn’t mention anything about ‘c’.

** — That would be the notation for comparing discrete variables as opposed to continuous variables, but given that most of real-world application of statistics uses sampling at discrete intervals, it’s understandable to use this notation instead of the integral sign.

# Continuity of a function.

Most of us, would have studied(likely in high school) about the idea of functions being continuous.

As the wikipedia section states, we end up with 3 conditions for a function over an interval [a,b].

• The function should be defined at a constant value c
• The limit has to exist.
• The value of the limit must equal to c.

Now this is a perfectly useful notion for most of the functions we encounter in high school. But there are functions that would satisfy these three conditions, but still won’t be helpful for us to move forward. And I just discovered while reading up for my self-educating on statistics. One moment there I was trying to understand what’s the beta distribution, or for that matter what sense does it make to talk about a probability density function, I mean understand what’s probability, but how can it have density and that sort of thing.. I lose focus a few seconds, and find myself tumbling down a click-hole to find a curiouser idea about 3 levels/types(ordered by strictness) of continuity of a function namely

The last one being what we studied and what I described above.
Now let’s get Climb up one more step on the ladder of abstraction and see what’s uniform continuity.
Ah we have five more types of continuous there namely

Ok I won’t act vogonic and try to understand or explain all of those . I just put them out there to tease feel free to click your way in.

To quote first line from the uniform continuous wiki:

In mathematics, a function f is uniformly continuous if, roughly speaking, it is possible to guarantee that f(x) and f(y) be as close to each other as we please by requiring only that x and y are sufficiently close to each other; unlike ordinary continuity, the maximum distance between f(x) and f(y) cannot depend on x and y themselves. For instance, any isometry (distance-preserving map) between metric spaces is uniformly continuous.

So what does this mean and how does it differ from the ordinary continuity? Well they say it up there that the maximum distance between f(x) and f(y) cannot depend on x and y themselves. i.e: the dist.function: df(f(x), f(y)) has neither x or y in it’s expression/input/right hand side.

The more formal definition can be quoted like this:

.
Given metric spaces (X, d1) and (Y, d2), a function f : X → Y is called uniformly continuous if for every real number ε > 0 there exists δ > 0 such that for every x, y ∈ X with d1(x, y) < δ, we have that d2(f(x), f(y)) < ε.

Now why would this be relevant or useful and why is it higher/stricter than ordinary continuity. Well note that it doesn’t say anything about an interval. The notion of ordinary continuity is always defined on an interval in the input space and clearly is confined to that. i.e: it is a property that is local to the given interval in input/domain space and may or may not apply on other different intervals.

On the other hand if you can say the function is uniformly continuous you’re effectively saying the function is continuous at all intervals.

Now how do we find a more general definition(i.e: absolute continuity?) Well look at the 3 conditions we defined at the start of this blog post. The first two can be collapsed to say the function must be differentiable over the given interval [a,b]. The third is the distance/measure concept we used in uniform continuous definition to remove the bounds on the interval and say everywhere. So obviously for the absolute continuous definition we do the next step and say the function must be uniformly continuous and differentiable everywhere.(aka uniformly differentiable).

Ok all of this is great, except where the hell is this useful. I mean are there function that belong to different continuous classes, so that these definitions/properties and theorems built on these can be used to differentiate and reason about functions. Turns out there are . I’ll start with something i glimpsed on my way down the click-hole, the cantor distribution. . It’s the exception that causes as to create a new class of continuous. It’s neither discrete nor continuous.

It’s distribution therefore has no point mass or probability mass function or probability density function.* Therefore throws a lot of the reasoning/theorem systems for a loop.

For the other example i.e: something ordinarily continuous but not uniformly continuous see here. . It’s a proof by contradiction approach.

* — Ok, I confess, the last point about point mass and probability mass/density function still escapes me. I’ll revisit that later, perhaps this time with the help of that excellent norvig’s ipython notebook on probability.

# September 6 – OMG! GMO! — Little facts about science

Today’s factismal: We’ve been using genetically-modified organisms to save lives for 38 years. If you keep up with the news, you are aware that there is a lot of arguing going on over the use of genetically-modified organisms, also known as GMOs to the acronym-lovers out there. On the one hand, there are those who […]

# Indian startups advice — 3.0

As I tend to be brash and stupid and offer advice/criticisms about running startups before(here and here and here.), I’ll do it again.

I shall dispense this advice now.

1. Treat your early employees more like partners than wage slave.*

2. This follows from the previous one. After every hire(and fire) re-consider your selection process.

3. Remember the Charlie Munger advice on trust here (or quoted below)**.

4. The best problem solvers, prefer to focus on solving the problem(s) and go right on to the next problem. They would much rather leave the performance reviews, raises (promised at the time of joining) etc.. to others. So if you do promise any review and raise based on that, follow through, Don’t delay with “we’ll do this in a formal setting in two weeks” dodge and then fail to follow through. You won’t build the best possible team with that approach.

5. Find the product/market fit..(Meh. I’m not qualified to say much about this without hands-on finding one).

5. Build a monopoly niche. Don’t compete on price, use your skills and knowledge to build a big manic monopoly, that would be the biggest barrier of entry to any competitors.
By the way the last two are just me re-gurgitating what I think makes sense from what I have read around. Only currently experimenting with implementing them.

** – “The highest form that civilization can reach is a seamless web of deserved trust — not much procedure, just totally reliable people correctly trusting one another. … In your own life what you want is a seamless web of deserved trust. And if your proposed marriage contract has forty-seven pages, I suggest you not enter.”
Source: Wesco Financial annual meeting, 2008 (quoted in Stanford Business School paper)

* — Note how I didn’t say anything about politeness or good salary or on time salary etc. That’s because all of those can be wrong ones to emphasize. My whole reason for this point is that they should have skin-in-the-game. Everything else can be worked around. Just don’t do this.

# Bayesians vs Frequentists(aka sampling theorists)

This is a long standing debate/argument and like most polarized arguments, both sides have some valid and good reasons for their stand. (There goes the punchline/ TLDR). I’ll try to go a few levels deeper and try to explain the reasons why I think this is kind of a fake argument. (Disclaimer: am just a math enthusiast, and a (willing-to-speculate) novice. Do your own research, if this post helps as a starting point for that, I’d have done my job.)

• As EY writes in this post about how bayes theorem is  a law that’s more general and should be observed over whatever frequentist tools we have developed?
• If you read the original post carefully, he doesn’t mention the original/underlying distribution, guesses about it or confidence interval(see calibration game)
•  He points to a chapter(in the addendum) here.
• Most of the post otherwise is about using tools vs using a general theory and how the general theory is more powerful and saves a lot of time
•  My first reaction to the post was but obviously there’s a reason those two cases should be treated different. They both have the same number of samples, but different ways of taking the samples. One sampling method(one who does sample till 60% success) is a biased way of gathering data .
• As a different blog and some comments point out, if we’re dealing with robots(deterministic algorithmic data-collector) that precisely take data in a rigourous deterministic algorithmic manner the bayesian priors are the same.
• However in real life, it’s going to be humans, who’ll have a lot more decisions to make about considering a data point or not. (Like for example, what stage of the patient should be when he’s considered a candidate for the experimental drug)
• The point I however am going to make or am interested in making is related to known-unknowns vs unknown-unknowns debate.
• My point being even if you have a robot collecting the data, if the underlying nature of the distribution is unknown-unknown(or for that matter depends on a unknown-unknown factor, say location, as some diseases are more widespread in some areas)  mean that they can gather same results, even if they were seeing different local distributions.
• A contiguous point is that determining the right sample size is a harder problem in a lot of cases to be confident about the representativeness of the sample.
• To be fair, EY is not ignorant of this problem described above. He even refers to it a bit in his 0 and 1 are not probabilities post here. So the original post might have over-simplified for the sake of rhetoric or simply because he hadn’t read The Red queen.
• The Red queen details about a bunch of evolutionary theories eventually arguing that the constant race between parasite and host immune system is why we have sex  as a reproductive mechanism and we have two genders/sexes.
• The medicine/biology example is a lot more complex system than it seems so this mistake is easier to make.
• Yes in all of the cases above, the bayesian method (which is simpler to use and understand) will work, if the factors(priors) are known before doing the analysis.
• But my point is that we don’t know all the factors(priors) and may not even be able to list all of them, let alone screen, and find the prior probability of each of them.

With that I’ll wind this post down. But leave you with a couple more posts I found around the topic, that seem to dig into more detail. (here and here)

P.S: Here’s a funny Chuck Norris style facts about Eliezer Yudkowsky.(Which I happened upon when trying to find the original post and was not aware of before composing the post in my head.) And here’s an xkcd comic about frequentists vs bayesians.

UPDATE-1(5-6 hrs after original post conception): I realized my disclaimer doesn’t really inform the bayesian prior to judge my post. So here’s my history/whatever with statistics. I’ve had trouble understanding the logic/reasoning/proof behind standard (frequentist?) statistical tests, and was never a fan of rote just doing the steps. So am still trying to understand the logic behind those tests, but today if I were to bet I’d rather bet on results from the bayesian method than from any conventional methods**.

UPDATE-2(5-6 hrs after original post conception):  A good example might be the counter example. i.e: given the same data(aka in this case frequency of a distribution, nothing else really, i.e: mean, variance, kurtosis or skewness) show that bayesian method gives different results based on how it(data) was collected and frequentist doesn’t. I’m not sure it’s possible though given the number of methods frequentist/standard methods use.

UPDATE-3 (a few weeks after original writing): Here’s another post about the difference in approaches between the two.

UPDATE-4 (A month or so after): I came across this post with mentions more than two buckets, but obviously they are not all disjoint sets(buckets).

UPDATE-5(Further a couple of months after): There’s a slightly different approach to splitting the two cultures from a different perspective here.

UPDATE-6: A discussion in my favourite community can be found here.
** — I might tweak the amount I’d bet based on the results from it .

# A rule to avoid black swans

In fact, any time anybody offers you anything with a big commission and a 200-page prospectus, don’t buy it. Occasionally, you’ll be wrong if you adopt “Munger’s Rule”. However, over a lifetime, you’ll be a long way ahead—and you will miss a lot of unhappy experiences that might otherwise reduce your love for your fellow man.

https://old.ycombinator.com/munger.html