Word_2_vector.. (aka word embeddings)

Word 2 vector:

  • word 2 vector is a way to take a big set of text and convert into a matrix with a word at
    each row.
  • It is a shallow neural-network(2 layers)
  • Two options/training methods (

CBOW(Continuous-bag-of-words assumption)

  • — a text is represented as the bag(multiset) of its words
  • — disregards grammar
  • — disregards word order but keeps multiplicity
  • — Also used in computer vision

skip-gram() — it is a generalization

of n-grams(which is basically a markov chain model, with (n-1)-order)
* — It is a n-1 order markov model
* — Used in Protein sequencing, DNA Sequencing, Computational
linguistics(character and word)
* — Models sequences, using the statistical properties of n-grams
* — predicts x_i based on x_(i-(n-1)), ....,x_(i-1) .
* — in language modeling independence assumptions are made so that each
word depends only on n-1 previous words.(or characters in case of
character level modeling)
* — The probability of a word conditional on previous n-1 words follows a
Categorical Distribution
* — In practice, the probability distributions are smoothed by assigning non-zero probabilities to unseen words or n-grams.

Bias-vs-Variance Tradeoff:

  • — Finding the right ‘n’ for a model is based on the Bias Vs Variance tradeoff we’re wiling to make

Smoothing Techniques:

  • — Problems of balance weight between infrequent n-grams.
  • — Unseen n-grams by default get 0.0 without smoothing.
  • — Use pseudocounts for unseen n-grams.(generally motivated by
    bayesian reasoning on the sub n-grams, for n < original n)

  • — Skip grams also allow the possibility of skipping. So a 1-skip bi(2-)gram would create bigrams while skipping the second word in a three sequence.

  • — Could be useful for languages with less strict subject-verb-object order than English.

Alternative link

  • Depends on Distributional Hypothesis
  • Vector representations of words called “word embeddings”
  • Basic motivation is that compared to audio, visual domains, the word/text domain treats
    them as discrete symbols, encoding them as sparse dataset. Vector based representation
    works around these issues.
  • Also called as vector Space models
  • Two ways of training: a, CBOW(Continuous-Bag-Of-Words) model predicts target words, given
    a group of words, b, skip-gram is ulta. aka predicts group of words from a given word.

  • Trained using the Maximum Likelihood model

  • Ideally, Maximizes probability of next word given the previous ‘h’ words in terms of a softmax function
  • However, calculating the softmax values requires computing and normalizing each probability using score for all the other words in context at every step.
  • Therefore a logistic regression aka binary classification objective functionis used.
  • The way this is achieved is called negative sampling

Seven secrets of shiva.

I thought of you when I read this quote from “Seven secrets of Shiva” by DEVDUTT PATTANAIK –

“Culture by its very nature makes room for some practices and some people, and excludes others. Thieves and criminals and ghosts and goblins have no place in culture.”

Start reading this book for free: http://amzn.in/5sGSMqm

A Data Driven Guide to Becoming a Consistent Billionaire

The Art and Science of Data

Did You Really Think All Billionaires Were the Same?

Recently, I became a bit obsessed with the one percent of the one percent – Billionaires. I was intrigued when I stumbled on articles telling us who and what billionaires really are. The articles said stuff like: Most entrepreneurs do not have a degree and the average billionaire was in their 30s before starting their business. I felt like this was a bit of a generalization and I’ll explain. Let’s take a look at Bill Gates and Hajime Satomi, the CEO of Sega. Both are billionaires but are they really the same? In the past decade, Bill Gates has been a billionaire every single year while Hajime has dropped off the Forbes’ list three times. Is it fair to put these two individuals in the same box, post nice articles and give nice stats when no one wants to be a…

View original post 1,457 more words