The Unscientific Analysis of Languages popular with Indian Startups

April 21, 2015

Well all this started with this one tweet

disruptive, IIM, ninja, passion. leverage, LSD, enterprise. this takes buzzwords to whole next level and more https://t.co/BTKniycDxa
— Sathya (@SathyaBhat) March 9, 2015

And finally ended with this one

@jackerhack thanks, got it @SathyaBhat @mehulved
— 100rabh™ (@the100rabh) March 9, 2015

And I had the entire dump of Hasjobs postings. It was pretty cool of Kiran to send them across to me and saving me the time and effort to scrape that data. At that time I had very little idea what I would do with it. I am aware of R and this was the moment when I thought I could make use of the little knowledge of it I had.

So I got on with it.

Step 1: Step R on my system.
Step 2: Write some code to extract the data and cleanse it
Step 3: Generate the counts for words
Step 4: Manually pick up the technology words with counts
Step 5: Generate the image with language popularity

So as it stands the top 5 of required technologies for Indian Startups are

1. PHP
2. Android
3. Ruby
4. IOS
5. Javascript

Surprised ? No ? At least I am because the one technology no one talks about but seems is highly used by Indian startups is PHP. Rest sound very reasonable to me. What do you guys think ?

Following was the code I wrote to extract the results. Let me know if I am missing something.

install.packages ("tm")
install.packages ("RColorBrewer")
library(NLP)
library(tm)
library(RColorBrewer)
corpus <- Corpus(VectorSource(hasjob.content$headline))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
td.mat <- as.matrix(TermDocumentMatrix(corpus))
write.matrix(format(td.mat, scientific=FALSE),
file = paste(targetPath, "data.csv", sep="/"), sep=",")

Comments

Product Designers Delhi said…

Hey keep posting such good and meaningful articles.

February 01, 2016 3:28 PM

Search This Blog

Idle/Random/whatever Thoughts of a Demented/Idle/Whatever Mind

The Unscientific Analysis of Languages popular with Indian Startups

Comments

Popular posts from this blog

[23rd March 2025] Interesting Things I Learnt This Week

[21st April 2025] Interesting Things I Learnt This Week

[16th March 2025] Interesting Things I Learnt This Week