The Unscientific Analysis of Languages popular with Indian Startups
Well all this started with this one tweet
And I had the entire dump of Hasjobs postings. It was pretty cool of Kiran to send them across to me and saving me the time and effort to scrape that data. At that time I had very little idea what I would do with it. I am aware of R and this was the moment when I thought I could make use of the little knowledge of it I had.
So I got on with it.
Step 1: Step R on my system.
Step 2: Write some code to extract the data and cleanse it
Step 3: Generate the counts for words
Step 4: Manually pick up the technology words with counts
Step 5: Generate the image with language popularity
So as it stands the top 5 of required technologies for Indian Startups are
1. PHP
2. Android
3. Ruby
4. IOS
5. Javascript
Surprised ? No ? At least I am because the one technology no one talks about but seems is highly used by Indian startups is PHP. Rest sound very reasonable to me. What do you guys think ?
Following was the code I wrote to extract the results. Let me know if I am missing something.
install.packages ("tm")
install.packages ("RColorBrewer")
library(NLP)
library(tm)
library(RColorBrewer)
corpus <- Corpus(VectorSource(hasjob.content$headline))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
td.mat <- as.matrix(TermDocumentMatrix(corpus))
write.matrix(format(td.mat, scientific=FALSE),
file = paste(targetPath, "data.csv", sep="/"), sep=",")
disruptive, IIM, ninja, passion. leverage, LSD, enterprise.
this takes buzzwords to whole next level and more https://t.co/BTKniycDxa
— Sathya (@SathyaBhat) March 9, 2015
And finally ended with this one
@jackerhack thanks, got it @SathyaBhat @mehulved
— 100rabh™ (@the100rabh) March 9, 2015
And I had the entire dump of Hasjobs postings. It was pretty cool of Kiran to send them across to me and saving me the time and effort to scrape that data. At that time I had very little idea what I would do with it. I am aware of R and this was the moment when I thought I could make use of the little knowledge of it I had.
So I got on with it.
Step 1: Step R on my system.
Step 2: Write some code to extract the data and cleanse it
Step 3: Generate the counts for words
Step 4: Manually pick up the technology words with counts
Step 5: Generate the image with language popularity
1. PHP
2. Android
3. Ruby
4. IOS
5. Javascript
Surprised ? No ? At least I am because the one technology no one talks about but seems is highly used by Indian startups is PHP. Rest sound very reasonable to me. What do you guys think ?
Following was the code I wrote to extract the results. Let me know if I am missing something.
install.packages ("tm")
install.packages ("RColorBrewer")
library(NLP)
library(tm)
library(RColorBrewer)
corpus <- Corpus(VectorSource(hasjob.content$headline))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
td.mat <- as.matrix(TermDocumentMatrix(corpus))
write.matrix(format(td.mat, scientific=FALSE),
file = paste(targetPath, "data.csv", sep="/"), sep=",")
Comments