# Central Tendency — measures

The 3 common measures of central tendency used in statistics are :

• 1. Mean
• 2. Median
• 3. Mode

There are of course other methods as the Wikipedia page attests. However the inspiration for this post was from yet another J.D.Cook’s blog ..

Note: That all these three and the other measures do obey the basic rules of measure theory.

The point being what you choose to describe your central tendency is key and should be decided based on what you want to do with it. Or more precisely what exactly do you want to optimize your process/setup/workflow for, and based on that you’ll have to choose the right measure. If you read that post above you’ll understand that:

Note: that even within mean there are multiple types of mean. For simplicity I’ll assume mean means arithmetic mean (within the context of this post).

• Mean — Mean is a good choice when you want to minimize the variance(aka, squared distance or second statistical moment about central tendency measure).. That’s to say your optimization function is dominated by a lot of square of distance(from central tendency measure) terms. Think of lowering mean squared error. and how it’s used in straight line fitting
• Median — Median is more useful if your optimization function has distance terms but not squared ones. So this will in effect be the choice when you want to minimize the distance from central tendency.
• Midrange — Midrange is useful when your function looks like max(distance from central measure)..

If most of that sounded too abstract then here’s a practical application I can think of right away to use. Imagine you’re doing performance testing and optimization of a small API you’ve built. Now I don’t want to go into what kind of API/technology behind it or anything. So let’s just assume you want to run it multiple times and calculate a measure of central tendency from it and then try to modify the code’s performance(with profiling + different libraries/data structures whatever….), so what measure of central tendency should you pick?

• Mean — Most Engineers would pick Mean and in a lot of cases it’s enough but think about it. It optimizes for variance of run/execution time. Which is important and useful to optimize in most cases, but in some cases may not be that important.
• Mode — An example is if your system is a small component of say a high-frequency trading platform and the consumer of it has a timeout and fails if it times out.(aka your api is mission-critical, it simply cannot fail). Then you want to make sure even in the lowest case your program completes. If the worst case runtime complexity is what you want to lower then you should pick mode. (Note this is still a trade-off over not lowering the average/mean use-case, just like hard-choice.)
• Median — This is very similar to Mean, except it doesn’t really care about variance. If you’re picking median, then your optimized program is sure to have the best performance in the average run/case/dataset
• Midrange — Well this is an interesting case. Think about it.. even in the previous timeout example i mentioned this could be useful. Here it goes,suppose your api is not mission-critical(i.e: if it fails the overall algorithm will just throw out that data term and progress with other data sources). when you want to maximize the number of times your program finishes within the timeout. i.e: you’re purely measuring the number of times you finish/return a value within the timeout period. You don’t care about the worst-case scenario.

There are other measures, such as:

Additionally, you can take mean of functions(non-negative ones too). See JDCook’s blog again.

# Probability– teaching, bayes vs frequentists etc..

http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/

I see this kind of reasoning at the core of denouncing standard null hypothesis testing in financial models as this blog says
http://epchan.blogspot.in/2013/01/the-pseudo-science-of-hypothesis-testing.html

I see the core error being the same.i.e: trying to derive inferences from probability calculations that ignore conditional probabilities or treat them as no different from other probabilities.

Now, i have specifically tried to stay out of the Finance sector as a field of employment. I never really thought or questioned the whys’ of it, but am beginning to understand. I actually like money and am a reasonable saver, and like mathematics so the sector has been and perhaps still is a perennial attraction it does pay a hell of a lot more.
but am beginning to realize the reason i have instinctively flinched from it. the most available jobs are accounting and customer relations, i don’t have much stomach for the routine of accounting and am no good at customer relations.. but after that the jobs and openings are myriad higher and higher levels of abstraction
like:
3. risk analysis
4. Portfolio management

etc..

Infact, i think this is the same problem with organizations doing normalizations of ratings and what not. I have a problem not because, i don’t think it makes sense to have all their employee ratings to fit to a normal curve, but i do have a problem in tweaking to fit exactly the normal curve at each reporting level. it’s just stupid and crazy application of standards and rules.

Also despite having a master’s degree, a bachelor’s in engineering, and having read a lot of science publications, and definitely having studied for exams, i never really understood the significance of p-values. I don’t really remember studying them very well, and somehow i don’t think they made sense if we studied it at any level of statistics course must look it up some other time.

(Obliquely related)
Probabability by stories:

I came across this story form of probability theory teaching.
See here

And was reading along, at the initial read of the story my first thought was that’s awfullay bayesian biased.
Soon realized, I never studied probability formally, definitely never beyond the dice/coin-toss example.
Have read, here and there(LW,NNT,EY and other blogs), knew there were three different interpretations,
but never was sure what those three were.

Anyway, reading the blog, it defines ‘classical’ as chalkboard situations, where we naively assume equal likelihood.
Now, that’s a category NNT would have called dangerously academic.(am somehow skeptical of this Defn.)

‘Empirical’ view relies on real-world frequencies.
(based on the examples, it’s more like projecting empirical observations from the past to the future)
Again, that sounds dangerously naive. Simply because it’s extrapolation with static/linear implicit assumptions.

‘Subjective’ view aims to express uncertainty in our minds, and therefore harder to define.

I am now finding all of these views rather, useless.
At this point am not sure what’s the point of these theoretical differences,
as they don’t seem to have a single effect on practice(i.e: reasoning with probabilities)

After reading the rest of the seriees, I get the reason why people are so divided on these interpretations.
But overall,think these should be personal preferences ultimately irrelevant to making a tight argument.(which should be based on the theorems)

# Python and church numerals

This is a cool piece of python code showing off the first class function ability of python, written by Vivek Haldar.

Perhaps, not very useful (well maybe except in some long integer handling, but not efficient there), nevertheless it’s a cool piece of code.
It’s cool, because it illustrates 2 things:
1.Python’s first class treatment of functions and lambda function facility allows you to stack on a bunch of func calls, while evaluating them only at the end.
2.Natural numbers can be represented as set and operators that add elements to the set. or It can be represented as a set of functions composed and applied on an initial value.

Here’s the code:

``` #Copied from (https://gist.github.com/2438498) zero = lambda f: lambda x: x succ = (lambda n: lambda f: lambda x: f(n(f)(x))) one = succ(zero) add = (lambda m: lambda n: lambda f: lambda x: n(f(m(f)(x)))) mult = (lambda m: lambda n: lambda f: lambda x: n(m(f))(x)) exp = lambda m: lambda n: n(m) plus1 = lambda x:x+1 church2int = lambda n: n(plus1)(0) def int2church(i): if i == 0: return zero else: return succ(int2church(i-1)) def peval(s): print s, ' = ',eval(s) peval('church2int(zero)') peval('church2int(succ(zero))') peval('church2int(one)') peval('church2int(succ(one))') peval('church2int(succ(succ(one)))') peval('church2int(succ(succ(succ(one))))') peval('church2int(add(one)(succ(one)))') peval('church2int(add(succ(one)) (succ(one)))') peval('church2int(add(succ(one)) (succ(one)))') peval('church2int(mult(succ(one))(succ(one)))') peval('church2int(exp(succ(one))(succ(one)))') peval('church2int(int2church(0))') peval('church2int(int2church(1))') peval('church2int(int2church(111))') c232 = int2church(232) c421 = int2church(421) peval('church2int(mult(c232)(c421))') print "232*421 = ",232*421 c2 = int2church(2) c10 = int2church(10) peval('church2int(exp(c2)(c10))') print '2**10 = ',2 **10 ```

Now the coolest part is how he builds on expressions using the lambda(anonymous function ability.)
This is also the reason I tend to frown, when someone calls Python a Object-Oriented language.

I mean, it’s as much function-oriented as it is object-oriented language.
The two don’t necessarily need to be exclusive, but calling a language , implies or atleast entails,
that design debates/arguments (ones that don’t have clear datapoints to swing either way), tend to be decided/settled by .

I have used python for about 5 years, hung around passively, python-ideas,python-users,python-core-mentorship mailing lists,
And I can’t find that strong a bias in any of these places.
I can perhaps claim Python is more (English language specific?) than any other paradigm.
It’s an issue I get into arguing with some interviewers in the past,
except very few seem to even try and explain why they call it object-oriented.
(signs they don’t know what they are talking about??)

I know the Python homepage calls itself object-oriented language, but that doesn’t mean the language is Object-oriented.
Besides, I am not really worried too much about the orientation of a language,person or cat.
I think, the point really comes down to this. What properties are you implying/inferring when you say object-oriented?
What properties do you need for your application that depends on the choice of the language?
Once you can answer some parts of the questions, you have an idea of what language to choose.

# Definition

Transcendental numbers are defined as those numbers which are not algebraic. In other words the numbers that are not the root of a non-zero polynomial equation with rational coefficients.

Corollary 1: Since all real, rational numbers are algebraic, all real, transcendental numbers are irrational.

# Proven Transcendental numbers:

1. ea – where a is algebraic and non-zero
2. phi
3. ephi
4. ab – where a is algebraic but not 0 or 1 and b is irrational algebraic.
5.  sina,  cosa,  tanaand their multiplicative inverses for any nonzero algebraic number a.
6.  ln(a)  – where a is algebraic and not equal to 0 or 1, for any branch of the logarithm function
7. W(a)  – where a is algebraic and nonzero, for any branch of the Lambert W function.
8.  ∑ j = 1β2n – where  0 < detβ < 1  and β is algebraic.
9.  ∑ j = 1βn! – where  0 < detβ < 1  and β is algebraic.

# The Mountain Where Rain Never Falls Math with Bad Drawings

The sixth in a series of seven fables/lessons/meditations on probability.

Another day of hiking brought the teacher and the student to an empty hut by a mountain stream. “We will rest here a while, and wash our clothes,” the teacher said. When they had laid their clean clothes on sunny rocks to dry, the student pointed to the clouds gathering in the valley below. “Looks like rain. Should we be worried?”

“The rains have reached this place only once in the last 100 years,” the teacher said. “What is the probability that they will reach us today?”

View original post 825 more words

# quarternions vs octonions

Octonions:
octet of 8 real numbers (e0,e1,…e8)
x = sigma(xiei) where i = 0 to 8

Octonions == pairs of Octonions
Quarternions == pairs of complex numbers

Quarternions:
non-commutative division algebra
non-commutative multiplication algebra
i^2 = j^2 = k^2 = i*j*k = -1
H = a*1 + b*i + c*j + d*k

So moral of the story? If you think complex numbers literally mean numbers with real and imaginary parts try naming these two group of numbers. The point really being the imaginary part of Complex numbers are not really considered imaginary in the sense of “not existent in the world” anymore. They both are considered a inevitable part of the world as science understands it. See Scott Aaronson’s quantum theory lecture here.

# stationary process

It’s a stochastic(big word) process whose joint probability distribution(big words), do not change when shifted in time or space.

I think the key big words that would throw off a regular user is stochastic and joint probability distribution. There’s also a couple of implicit assumptions like:
1.joint probability distribution of variables/parameters of the process
2. These variables/parameters are random

Stochastic process: well this just means the process is assumed to be a collection of random variables.

Joint probability distribution: this just means an n-dimensional space/surface/volume which encompasses the probability values that could be taken by the n number of random variables.

Ok, ok, that’s all too much symbolic. what does it mean in correspondence to anything in real life?

Well, this is all math models that are used for various real-life phenomena. In this case, let’s assume the stock market prices. So what does it mean to say, it is a stochastic process? and what exactly is the joint probability distribution and what the hell does a stationary process mean?

I’ll try to answer these as accurately as i think i can below, but keep in mind, am just an amateur who does this for fun. Don’t blame me for applying my ideas and losing money.

Well to begin with stochastic process, means we are assuming that the stock price generation is a random process. Note that it doesn’t translate to we don’t know what generates it or how it changes, but to we can’t find any pattern that can be used repeatedly and works on the stock prices fluctuations, within reasonable error values that is.

The difference is subtle and easy to miss. We are making a model and the goal of it is to be able to more accurately describe the data we have seen. One useful application is predictive analytics, but with stock market as NNT has argued so loudly, we need to be very careful of fractal/viral effects and also “black swan effects”

Anyway, to get back to our original question, how do we know if the stock market prices are a stationary process or not. Well, the answer turns out to be rather simple(at least in theory, I’ll get to what i know about the practice in a minute). We figure out or pick a set of variables that constitute stock market prices. Note, the actual trading price is dynamically, changing, and we usually approximate (either the closing price, or average over a length of time). But the point is we need find a set of variables to make a joint probability distribution out of and transform or track them. There are quite a few available and some of the immediate ones many will recognize are (P/E ratio, Dividends history,transaction volume, earnings per quarter etc..)

The picking of the number of variables and which variables, depend on what kind of trade-offs you want to make for your model. Typical challenging factors are:
1.Computational capacity(cpu,memory, time etc..)
2.Data availability
3.Uncertainty/entropy the variable adds/subtracts from the model

Once you have these you plot/fit them to a function of n-dimensional space* and call it a joint-probability distribution.

Now comes the testing for stationarity part, you translate this function in one of the dimensions and see if the overall properties of the functions changes. (i.e: properties like volume/area/length, convexity/concavity, topological properties like connectedness, fractal dimensions etc.)**

The Wikipedia article talks about shifting in time, but am ignoring that assuming it can be considered as one of these n-dimensions. The article perhaps uses it because stationary process it seems is a key terminology in time-series analysis.

Ok, so we know whether our model of the stock price of a particular company is stationary or not. So what does this mean to me in my investing behaviour. How should i let it affect or( should i not ) let it affect my investing behaviour?

Now that’s what quite a lot of investment analysts get paid for and as a few of them have spoken out, it’s useful to use some skeptic questioning about their recommendations. Mine on the other hand should be either dismissed altogether or thought about, re-read, take notes of your thoughts, re-read altogether etc.

Any case, my first instinct is to say, the outcome in this case the stock price as modeled by your chosen variables, has very low volatile. Actually with the assumption i have been running with it should have zero volatility, i.e: it’s price stays the same. But then stationarity is not a binary variable as i implied implicitly somewhere in the text above. It is a degree and can be measured based on how much the function mismatches with the original with respect to the amount of translation.
Am guessing it would be some form of partial derivative terms, but don’t have the energy or time to look up, understand and write. Besides am already close to the 1000 word mark and will sign-off now.

With these conclusions:
1. You can check for stationarity of a function, and make predictions based on that about future variations with respect to the parameters involved.
2. You can model a stock price of any company, with respect to some parameters you picked and predicted future prices with respect to changes in the parameters. (It can also be used as a way to test the model accuracy, backtesting it is called i think)
3. Stationary process is an interesting concept or standard in terms of measuring accuracy of your simulation models.

* — I suspect in practice it comes down to 3 or 4 at most but i am just conjecturing there.
** — most likely in this case only the area/volume/length properties.

# Miller-Rabin Primality Test Math ∩ Programming

Problem: Determine if a number is prime, with an acceptably small error rate.

Solution: (in Python)

Discussion: This algorithm is known as the Miller-Rabin primality test, and it was a very important breakthrough in the study of probabilistic algorithms.

Efficiently testing whether a number is prime is a crucial problem in cryptography, because the security of many cryptosystems depends on the use of large randomly chosen primes. Indeed, we’ve seen one on this blog already which is in widespread use: RSA. Randomized algorithms also have quite useful applications in general, because it’s often that a solution which is correct with probability, say, \$latex 2^{-100}\$ is good enough for practice.

But from a theoretical and historical perspective, primality testing lied at the center of a huge problem in complexity theory. In particular, it is unknown whether algorithms which have access to randomness and can output probably correct answers are more…

View original post 425 more words