# Topological maps or topographic maps?

While surfing the web the other day I read an article in which the author refers to a “topological map.” I think it is safe to say that he meant to write “topographic map.” This is an error I’ve seen many times before.

A topographic map is a map of a region that shows changes in elevation, usually with contour lines indicating different fixed elevations. This is a map that you would take on a hike.

A topological map is a continuous function between two topological spaces—not the same thing as a topographic map at all!

I thought for sure that there was no cartographic meaning for topological map. It turns out, however, that there is.

A topological map is a map that is only concerned with relative locations of features on the map, not on exact locations. A famous example is the graph that we use to…

View original post 95 more words

# Python’s Weak Performance Matters

Here is an argument I used to make, but now disagree with:

Just to add another perspective, I find many “performance” problems in
the real world can often be attributed to factors other than the raw
speed of the CPython interpreter. Yes, I’d love it if the interpreter
were faster, but in my experience a lot of other things dominate. At
least they do provide low hanging fruit to attack first.

[…]

But there’s something else that’s very important to consider, which
rarely comes up in these discussions, and that’s the developer’s
productivity and programming experience.[…]

This is often undervalued, but shouldn’t be! Moore’s Law doesn’t apply
to humans, and you can’t effectively or cost efficiently scale up by
throwing more bodies at a project. Python is one of the best languages
(and ecosystems!) that make the development experience fun, high
quality, and very efficient.

(from Barry Warsaw)

I…

View original post 705 more words

# Data Science: Structured thinking — a collection of guide.

Inspired by this. Read it first: http://www.analyticsvidhya.com/blog/2013/06/art-structured-thinking-analyzing/

1. Figure out the questions involved in the analytics project and decide which ones can be tackled
separately, and which ones are intertwined with others, and which ones need to be answered first
before tackling others. Then pick one.
0.5 Layout the data requirements and hypothesis before looking at what data is available
2. Actually Look at the data summary(dataframe.describe()) that includes mean, mode, std, and quartiles)
3. Look for patterns in the summary. Think about what each of the values mean to your question? What
do questions do they lead to? How do they modify your question?
4. Figure out the ML problem use this.

5. Go back to step 1 and 2 again and redo them with the ML problem .
6. See if you have enough data (noise vs signal) or you need more samples or do you need more
features. (see http://scikit-learn.org/stable/modules/feature_selection.html)

First Model building time-split:
1.Descriptive analysis on the Data – 50% time
2.Data treatment (Missing value and outlier fixing) – 40% time
3.Data Modelling – 4% time
4.Estimation of performance – 6% time

Data Exploration steps:
Source Reference: https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
Below are the steps involved to understand, clean and prepare your data for building your predictive model:

``````    1.Variable Identification
2.Univariate Analysis
3.Bi-variate Analysis
4.Missing values treatment
5.Outlier treatment
6.Variable transformation
7.Variable creation

Missing Value Treatment:
1.Deletion:
2.Mean/ Mode/ Median Imputation
3.Prediction Model:
4.KNN Imputation:
Outlier Treatment:
1.Data Entry Errors:
2. Measurement Error:
3. Experimental Error:
4. Intentional Outlier:
5. Data Processing Error:
6. Sampling error:
7. Natural Outlier:
``````

# Exploring the ChestXray14 dataset: problems

A couple of weeks ago, I mentioned I had some concerns about the ChestXray14 dataset. I said I would come back when I had more info, and since then I have been digging into the data. I’ve talked with Dr Summers via email a few times as well. Unfortunately, this exploration has only increased my concerns about the dataset.

##### DISCLAIMER: Since some people are interpreting this wrongly, I do not think this piece in any way reflects broader problems in medical deep learning, or suggests that claims of human performance are impossible. I’ve made a claim like that myself, in recent work. These results are specific to this dataset, and represent challenges we face with medical data. Challenges…

View original post 5,094 more words

# Word_2_vector.. (aka word embeddings)

## Word 2 vector:

• word 2 vector is a way to take a big set of text and convert into a matrix with a word at
each row.
• It is a shallow neural-network(2 layers)
• Two options/training methods (

### CBOW(Continuous-bag-of-words assumption)

• — a text is represented as the bag(multiset) of its words
• — disregards grammar
• — disregards word order but keeps multiplicity
• — Also used in computer vision

### skip-gram() — it is a generalization

of n-grams(which is basically a markov chain model, with (n-1)-order)
* — It is a n-1 order markov model
* — Used in Protein sequencing, DNA Sequencing, Computational
linguistics(character and word)
* — Models sequences, using the statistical properties of n-grams
* — predicts $x_i$ based on $x_(i-(n-1)), ....,x_(i-1)$.
* — in language modeling independence assumptions are made so that each
word depends only on n-1 previous words.(or characters in case of
character level modeling)
* — The probability of a word conditional on previous n-1 words follows a
Categorical Distribution
* — In practice, the probability distributions are smoothed by assigning non-zero probabilities to unseen words or n-grams.

• — Finding the right ‘n’ for a model is based on the Bias Vs Variance tradeoff we’re wiling to make

## Smoothing Techniques:

• — Problems of balance weight between infrequent n-grams.
• — Unseen n-grams by default get 0.0 without smoothing.
• — Use pseudocounts for unseen n-grams.(generally motivated by
bayesian reasoning on the sub n-grams, for n < original n)

• — Skip grams also allow the possibility of skipping. So a 1-skip bi(2-)gram would create bigrams while skipping the second word in a three sequence.

• — Could be useful for languages with less strict subject-verb-object order than English.

• Depends on Distributional Hypothesis
• Vector representations of words called “word embeddings”
• Basic motivation is that compared to audio, visual domains, the word/text domain treats
them as discrete symbols, encoding them as sparse dataset. Vector based representation
works around these issues.
• Also called as vector Space models
• Two ways of training: a, CBOW(Continuous-Bag-Of-Words) model predicts target words, given
a group of words, b, skip-gram is ulta. aka predicts group of words from a given word.

• Trained using the Maximum Likelihood model

• Ideally, Maximizes probability of next word given the previous ‘h’ words in terms of a softmax function
• However, calculating the softmax values requires computing and normalizing each probability using score for all the other words in context at every step.
• Therefore a logistic regression aka binary classification objective functionis used.
• The way this is achieved is called negative sampling

# Seven secrets of shiva.

I thought of you when I read this quote from “Seven secrets of Shiva” by DEVDUTT PATTANAIK –

“Culture by its very nature makes room for some practices and some people, and excludes others. Thieves and criminals and ghosts and goblins have no place in culture.”