Archive for the Data mining Category

Will Color Be Bigger Than Google or Facebook?

Posted in Data mining, Entrepreneurship, Facebook, Google, Innovation, Social Media Marketing with tags , , , on April 22, 2011 by Shankar Saikia

In one’s quest to be entrepreneurial one often ponders new products and ideas. This week I started playing with an iPhone application call Color . This is a photo-sharing application like Instagram, but with a twist. Instagram enables you to take pictures with your iPhone and then share it with friends on a social network like Facebook, Twitter etc. Color is different – first of all it allows you to join a group within a certain distance of where you are physically located. Second, any pictures that you take are shared with that group. Third, and I have not tried this part yet, you can receive messages from group members! It’s a strange application, primarily because it is so different from social networks that we know – Facebook, Twitter etc. In a conventional social network, you join a network. In Color you join a group without joining a social network. Still don’t get it? Me too 😉

Two weeks ago Silicon Valley and by extension the tech world, was abuzz about Color – one reason, and probably the main reason, being that venture capitalists injected $41 million into a series A round into the company. Once I heard about the investment I became curious, downloaded the app and started experimenting. Then I read that the $41 million is part of the quest for “the new Facebook”. I can see why the search for the new Facebook has begun. It appears that Google is having problems, Twitter is having challenges and Facebook is highly valued – so, no time like the present to search for the next big thing. After all, this is Silicon Valley where tech innovation never stops.

While I certainly do not fully comprehend everything about the Color app, I admire the thinking outside-the-box of the founders. In some ways, Facebook is becoming predictable and boring. Maybe now is the time to “friend” someone you do not know, as long as she or he is physically near you – wow! how cool would that be?? That’s just one aspect of Color. There are other angles such as data mining – for example, how does a person behave when in a certain group? Well, just study the data gathered from that person’s interactions in that group…

These are exciting times in this age of mobile, social and cloud!

Any outside-the-box ideas that you are pondering?


Data Mining: Online Examples

Posted in Data mining, Online, VLAB with tags on January 21, 2010 by Shankar Saikia

Roughly three years ago, I received the following nugget of wisdom regarding entrepreneurship: do something at the “intersection of your interests, skills and where the market is going.” This came from a presenter at VLAB who gave the same advice to her son as he was heading to college and felt that the advice was appropriate for aspiring entrepreneurs as well.

With so much going on in the glamorous worlds of mobile, social networks, music etc., why would one pick something as boring, mundane, prosaic ( fill in your favorite synonym for “boring” here) _______ as data ??? For me it’s because I’m a numbers guy, I like the worlds of planning, forecasting etc. and it appears that data analysis is emerging as an area of growth – for me data mining is at the intersection of my skills, likes and where the market is going.

What is data mining? My informal definition is that data mining is the process of getting some benefit, whether economic or non-economic, from information. A simple example, courtesy of Roger Magoulus from OReilly Media,  is’s listing of each book’s “ Sales Rank” – that single piece of data helps buyers make decisions.

Everyone is aware of the tremendous growth of social networks. Anyone who uses LinkedIn has probably noticed the “People You May Know” box – that’s a case of LinkedIn using information on connections between your contacts and the contacts’ contacts  – another example of data mining. If you do a search for “data mining” on a job site like Simply Hired, you may find social networking companies like Yelp and Facebook advertising for data mining experts. Today I noticed that Simple Hired itself has added a neat capability – it can show your Linkedin contacts next to a job listing – the value being that you can ask your LinkedIn contact to possibly to give you a referral – another example of using data for your benefit.

What about those “other” companies?  Can a “normal” company, not just a social networking site, also mine online data? Sure. Take a look at this chart that shows trends for Google searches for rental car companies:

Trends in Google Searches

In this case each rental car company can investigate why there were relatively more searches for Enterprise Rental Car – was it because Enterprise advertised more in a specific location? This is an example of data mining of external information (i.e., information that does not reside within the corporate technology systems). You can view the actual chart here, and even drill into specific locations (for example, there were more searches for Dollar Rental Car in Hawaii).

Hal Varian, an economist who works for Google, recently said that statistics may be a good career choice in the future: “… The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill..”

I hope these examples gave you a better understanding of data mining as it pertains to online data. What do you think?

VLAB: Data Exhaust Alchemy Event (January 19, 2010)

Posted in Data mining, VLAB on January 19, 2010 by Shankar Saikia


I attended the VLAB event “Data Exhaust Alchemy: Turning the Web’s Waste Into Solid Gold” at Stanford GSB today. It was great – Bishop Auditorium was packed. Here’s a very brief summary of what I learned from each speaker:

1. Roger Magoulas, Director of Market Research, O’Reilly Media: He related the story of how, by simply adding the popularity ranking of books, was able to add a lot of value for customers – a great example of using the data exhaust.

2. JB (Mike John-Baptiste), CEO, PeerSet: Peerset has developed algorithms that mine web data to help advertisers target the right audience.

3. Mark Breier, General Partner, In-Q-Tel: The venture arm of the CIA has invested in the following companies: Visible Technologies, Palantir and Fortius One. There are many security-related applications of the data exhaust.

4. Jeff Hammerbacher, Vice President of Products and Chief Scientist, Cloudera: He left Facebook because he felt that he did not understand consumer technologies such as online advertising. Cloudera makes the open source version of Hadoop, which uses the Mapreduce algorithm developed by Google.

5. Dr. DJ Patil, Chief Scientist and Sr. Director of Product, LinkedIn: He preferred the word “ecosystem”  (over the phrase “data exhaust”) to describe the data created on the web. He mentioned that with every passing day there are fewer people who are not on Linkedin.

6. Pablos Holman, Futurist, Inventor, Security Expert, and Notorious Hacker: He was AWESOME. He stressed that, from a security perspective, everything that we do online and using mobile devices is in the cloud. He showed a cool demo of how our credit cards are NOT that secure.

My overall opinion is that the speakers and their respective organizations were working on some very difficult and exciting problems related to the growing volume of data. A point that Richard made was that mining the social graph (e.g., our Facebook friends and the things we do, like etc. as recorded on Facebook) is very challenging. The good news is that companies like LinkedIn have been able to extract value from its data, and added cool capabilities such as recommending people we can connect to.

Bottom line:  did the meeting give me any ideas for products or services that I can sell to my enterprise customers? ….. Yes.

Data Deluge!

Posted in Data mining, Enterprise Software Sales, Twitter on January 12, 2010 by Shankar Saikia

The year was 2006, George W was in the white house and Google was the king of search. If you wanted a restaurant review you probably either looked at a copy of Zagat (the book, not online!) or you did a Google search. Fast forward to 2010 and what’s changed? Beyond the obvious change inside 1600 Pennsylvania Avenue, now if you want to research a restaurant you do a Yelp search – what a difference 4 years makes! What else has changed?  The biggest change is the growth of data – mobile interactions, Google searches, Twitter tweets, Yelp reviews, Facebook friends and pokes and pictures …. it’s a data deluge out there!

I”m increasingly being convinced that the next big tech opportunity lies in being able to do something with the data deluge. A recent column in Gigaom mentions that Facebook’s greatest asset is it’s social graph (i.e., the connections between people and their friends, pokes, pictures etc.). Facebook is working very hard at extracting value from this data. Similarly, Yelp is trying to mine its own user-generated restaurant reviews.

The key to solving this data-mining challenge is to engage in “non-linear thinking.” It’s important to keep in mind that there is more to data mining than the IT-focused steps such as data collection, aggregration, modeling, validation etc.  One of the most difficult parts of data mining social media information is that the data is mostly unstructured (i.e., text, pictures etc.).

I’m really excited about the data challenges – this is the age of big data – for more on big data read this .

What do you think? Do you see an opportunity in the data deluge?