Matt Turck & Om Malik on Big Data in New York City

Bloomberg Ventures’ Matt Turck has a post on “the thriving data ecosystem” in NYC.  (It’s just “budding” in the title of the post.)  There is little differentiation between unstructured and structured data in the descriptions but the compelling point is a large customer base for big data is based here.  For the actual who’s who in the Big Apple:

NYC has a number of prominent data scientists, including (but certainly not limited to), Drew Conway and Jake Porway (both of whom are co-founders of Datakind, f/k/a Data without Borders), Max Shron, Cathy O’Neil (who left D.E. Shaw for a startup, Intent Media), Gilad Lotan, etc.  And of course, we have our very own emerging media star (deservedly so) in the person of Hilary Mason, most recently profiled here.

3) A data community:  Whether it’s Data Drinks or meetups, there’s clearly appetite for data nerds to get together and geek out. Both the NYC Predictive Analytics meetup (organized by Alex Lin) and the NYC Machine Learning meetup (organized by Paul Dix and Max Khesin) have over 2,000 members, while the New York Open Statistical Programming Meetup has 1,700 members.

Om Malik touches on the same ground, with the twist that he believes New York has the creative ability to ask the right questions about the data:

Mason believes that New York can leverage big data to its advantage. From art to fashion to media, New York has enough creative talent to be able to ask the right questions from the data. A good example is the Explore feature onFoursquare, which co-founder Dennis Crowley calls the “big data driven recommendation engine for the real world.” (Here is a presentation of technology behind Explore that is pretty cool.)

Posted in Uncategorized | Tagged , , , | Leave a comment

Big Data gets Big Money

Jaikumar Vijayan of Computerworld reports on a spate of recent funding for “big data” companies recently including Birst ($26 million), Cloudera ($40 million), MapR ($25 million), 10Gen of MongoDB fame ($32 million), DataStax ($11 million), Domo ($60 million), Karmasphere ($12 million)

Much of the investor interest stems from massive enterprise demand for big data tools, said Greg McDowell, an analyst with investment banking firm JMP Securities.

Companies like Splunk have been growing at a frenetic pace over the past few quarters adding hundreds of new customers each quarter, he said.

“Big data has become big business,” McDowell said. “Companies are looking for tools to store, manage, manipulate, analyze, aggregate, combine and integrate data.”

McDowell said the market for big data tools is projected to rise from $9 billion in 2011 to $86 billion in 10 years. By 2020, spending on big data tools will account for some 11% of all enterprise IT spending, he added.

But Curt Monash of Monash Research has words of caution on the hype:

Curt Monash, an analyst at Monash Research, suggested that at least some of the intense investor interest stems from the hype surrounding big data technology over the past year or so.

“A great example of hype is anybody calling Birst a ‘big data’ or ‘big data analytics’ company,” Monash said in an email to Computerworld. “If anything, Birst is a ‘little data’ analytics company that claims, as a differentiating feature, that it can handle ordinary-sized data sets as well.”

“The great growth in database sizes is both caused and balanced out by Moore’s Law,” he said. “The net effect is healthy but not enormous growth in the overall data management and analytics markets.”

 

Posted in Funding/Investments | Tagged , , , , , , , , , , , | Leave a comment

Get Ahead With Data Science at General Assembly in New York

The incubator-cum-educational cauldron that is General Assembly in New York is offering a ten week class to get you up to speed June 5th through July 9th, Tuesday and Thursday evenings:

Get Ahead with Data Science

The sheer wealth of data collected these days calls for people who can make sense of it all. Data science describes an end-to-end process – from data collection to final presentation – requiring statistical knowledge and data processing skills that help surface valuable insights.

We’re excited to announce a 10-week program in Data Science taught by Max Shron, former data scientist at OKCupid. Students will examine and practice all parts of the data science pipeline – problem specification, data transformation, data exploration, statistical modeling, and visual presentation – and learn how to build reusable tools to work more efficiently with data sets.

This course is designed for students with proficiency in a scripting language such as PythonR, or Matlab, familiarity with elementary probability, and a comfort with basic data manipulation.

Posted in Data Science | Tagged , , , | Leave a comment

Google’s Big Data gets bigger…and cheaper than Amazon

Quentin Hardy of the New York Times’ bits blog reports $0.12 per gigabyte of storage versus $0.125 gigabytes at Amazon Web Services probably isn’t as significant of a difference as the notion that a reverse auction could soon be going on between major players.  There will also be a charge per query.  Google claims greater consumer friendliness, but has a track record of iterating products only to have them removed or eliminate support.  Who would you trust your Big Data with?

“When you have really large data sets, we have the capability to analyze them,” said Ju-kay Kwek, product manager for Google’s cloud data effort. “A query with five terabytes of data involved could be returned in 15 seconds.” That is, he said, about 10 times faster than the speed of many corporate data systems. He noted that in companies today, “it’s not uncommon to have problems that take half a day to analyze.”

Google’s aim may be to sell data storage in the cloud, as much as it is to sell analytic software. A company using BigQuery has to have data stored in the cloud data system, which costs 12 cents a gigabyte a month, for up to two terabytes, or 2,000 gigabytes. Above that, prices are negotiated with Google. BigQuery analysis costs 3.5 cents a gigabyte of data processed.

Amazon doesn’t breakout AWS results but in January Jeff Barr reported 762 billion objects stored, and the race for market share is on.

Posted in The Cloud | Tagged , , , , , , | Leave a comment

Phenotypes in Big Data

Big data as a concept centers around computing power in unstructured datasets.  For health care applications sometimes the data needs to get dressed up.  Information Week reports on SUNY-Buffalo researchers using IBM and Revolution Analytics’ software to hunt for the causes of multiple sclerosis.

Unlike Netezza’s famous cousin Watson, which is designed to work with unstructured data, Netezza needs structured databases to do its magic, Dolley noted. Genetic data and clinical documentation in the free text portions of electronic health records are unstructured. So the scientists will have to spend a considerable amount of time cleaning up their data and making it manageable. But once the information is ready, IBM’s system can analyze it all within minutes.

This sort of work is the heart of the need for 1.5 million new data scientists, and perhaps that a general understanding of how data science will bleed into nearly all white collar, information-processing jobs.

Posted in Big Data, Data | Tagged , , , , , | Leave a comment

Wanted: 1.5 million Data Scientists

“Big Data” is the next big thing according to the Wall Street Journal.  Who is going to do it?

Keying off a McKinsey & co Report last year, the WSJ reports there will be a need for 1.5 million new data scientists

Hilary Mason says there are three necessary skills for big data usage; theorizing data, actually building big data, and then asking the right questions.  In the embedded video, she claims the question is not how big your company is, but whether the decisions you make could be improved by more data.

One has to add, there is a threshold beyond that: are the improvements you can make in the use of your data likely to exceed the cost of doing a “big data” implementation.  Adsense and other self-serve web analytics like bit.ly have made that easy but this seems an incremental advance on the web revolution, not a revolution in itself.  Theorizing about how to build the model presumes a great deal of knowledge about the real world business processes themselves your organization (and/or your competitors) use.  That is likely the harder problem.

Posted in Big Data, Data, Data Computing, Stats | Tagged , , , , , | 1 Comment

Workday in April 21st Barron’s

Mark Veverka’s Barron’s “Plugged In” column is the regular spot for authoritative coverage in the financial press of enterprise software and two weeks ago covered Workday, the cloud HR provider of Human Resource management software founded by David Duffield.  ”PeopleSoft 2.0.”  He reported they were doing $300 million in billings and could be valued at $2 billion accordingly.   It did largely recap ground from his 2011 coverage, but speculated an IPO was immanent.

The article also reported an estimate by Morgan Stanley’s Adam Holt that cloud software sales would rise to $35 billion by 2017.

Posted in Organizational Technology | Tagged , , , , | Leave a comment

UC Berkeley Chosen to Host James Simons’ “Theory of Computing Institute”

James Simons, the hedge fund manager of Renaissance Technologies, and Ph.D. holder from Berkeley chose his alma mater as the winner of a $60 million grant to develop a “Theory of Computing Institute.”  It will ramp up this summer and be fully operational in 2013.  Richard Karp will be the institute’s director.  Aside from other philanthropic endeavors such as autism research, Simons previously gave $150 million to SUNY-Stony Brook.   It might also be noted that Renaissance is probably one of the leading practicioners of the use of “big data”, by filtering and acting upon meaningful trading algorithms.  A long way from the Jesse Livermore gut traders, even over two years ago, Bloomberg reported nearly a third of their then 275 employees had Ph.Ds in natural sciences.

The full UC Berkeley statement is here.  This is another validation of the leading role of the bay area in big data theory and processing, which will add seventy staff locally.  John Markoff in the New York Times noted in coverage that this follows the establishment last year of Boston University’s Harari Institute of interdisciplinary computational computing.

Posted in Cloud Computing, Data Computing | Tagged , , , | Leave a comment