Tuesday, November 15, 2011

Useful Dataset sites

Dataset Lists

Dublin Dashboard
http://www.dublindashboard.ie/pages/index

Quandl.com
http://www.quandl.com

KDNuggests: Datasets for Data Mining
http://www.kdnuggets.com/datasets/

Google: Dataset Directory
http://www.google.com/publicdata/directory

opendata.ie
http://opendata.ie/

StatLib---Datasets Archive
http://lib.stat.cmu.edu/datasets/

U.S. Census Bureau
http://factfinder.census.gov/servlet/DatasetMainPageServlet

Datasets: 2010 UK Election Results
http://www.guardian.co.uk/news/datablog/2010/may/07/uk-election-results-data-candidates-seats

Computer Vision Papers Datasets
http://www.cvpapers.com/datasets.html

Dataset Analytics Vocabulary
http://vocab.sindice.net/analytics

Datasets - DNA Analytics CGH
http://www.genomics.agilent.com/GenericA.aspx?pagetype=Custom&subpagetype=Custom&pageid=2079

Datasets collected by bitly
https://bitly.com/bundles/hmason/1

MCFC Analytics
http://www.mcfc.co.uk/The-Club/MCFC-Analytics

Truthy: Information Diffusion in Online Social Networks
http://cnets.indiana.edu/groups/nan/truthy/


Datasets from Competitions

KD-Nuggets
http://www.kdnuggets.com/competitions/index.html

Berlin Brain-Computer Interface
http://www.bbci.de/competition/

Netflix Prize
http://www.netflixprize.com/

ACM KDD Cup

http://www.sigkdd.org/kddcup/index.php

Santa Fe Time Series Competition Data Set B
http://www.physionet.org/physiobank/database/santa-fe/

Time Series Forecasting Grand Competition for Computational Intelligence
http://www.neural-forecasting-competition.com/downloads/NN5/datasets/download.htm

PAN 2012 - Uncovering Plagiarism, Authorship and Social Software Misuse
http://www.uni-weimar.de/medien/webis/research/events/pan-12/pan12-web/authorship.html


Brian Mac Namee recommends:

UC Irvine Machine Learning Repository
http://archive.ics.uci.edu/ml/

Central Statistics Office Ireland
http://cso.ie, also check out the 2011 census data

InfoChimps: Find data for apps & analytics
http://www.infochimps.com/

Data in Gapminder World
http://www.gapminder.org/data/

Welcome to the London Datastore
http://data.london.gov.uk/

Kaggle
http://www.kaggle.com/

U.S. data.gov
http://www.data.gov/ 

Last.fm Music Preferences Data
http://denoiserthebetter.posterous.com/music-recommendation-datasets

Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

A Big List From Mahout
https://cwiki.apache.org/MAHOUT/collections.html

What Is Data in Literary Studies?
http://arcade.stanford.edu/content/what-data-literary-studies-1


Brendan Spillane recommends:

IMDB - Alternative Interfaces
http://www.imdb.com/interfaces

Export.ly: Export your data from social media as Excel or CSV
http://www.makeuseof.com/dir/exportly-export-your-data/


Garrett Duffy recommends:

LOGD Dataset Catalog
http://logd.tw.rpi.edu/datasets

Alan Cooke recommends:

DBpedia
http://dbpedia.org/About

DataSift
http://datasift.com/


Colman McMahon recommends:

30 Places to Find Open Data on the Web
http://blog.visual.ly/data-sources/

Finding Data on the Internet
http://www.inside-r.org/howto/finding-data-internet

WikiVis
http://www.wikiviz.org/wiki/Data_sources

Forbes: Special Report - Data Driven
http://www.forbes.com/special-report/data-driven.html

Monday, November 14, 2011

Top 10 Business Intelligence, Analytics and CPM Stories

For Data Analytics

Please go to the following link:

http://searchbusinessanalytics.techtarget.com


register for this site, and look at the article
on the homepage entitled "Top 10 Business Intelligence,
Analytics and CPM Stories of 2010"
http://searchbusinessanalytics.techtarget.com/news/2240028281/Top-10-business-intelligence-analytics-and-CPM-stories-of-2010

--------------------------------------------------------------------

10. Mega-vendors boss the BI market – but their power isn’t absolute
In its 2010 BI Magic Quadrant report, consulting firm Gartner Inc. said that the usual mega-vendor suspects – IBM, Microsoft, Oracle and SAP – continue to dominate the BI software market. But customer satisfaction levels were down for some of them, according to Gartner – a situation that SAP, for one, tried to remedy via expanded support for BusinessObjects users. Gartner found the same kind of forces at play among CPM vendors: A separate Magic Quadrant report split that market into CPM integrators and innovators, and Gartner later said that Software as a Service (SaaS) and pure-play CPM vendors topped the customer satisfaction rankings in a survey of vendor-supplied reference users. That’s all the more reason to make sure you buy the right BI, analytics and CPM tools for your organization.

9. Companies still eyeing BI consolidation/standardization
More than half of the respondents to a SearchBusinessAnalytics.com survey on BI priorities and challenges last March reported that their organizations were using multiple BI tools. With so much BI software in place, many organizations continue to eye BI tool consolidation as a way to clean up their systems and reduce costs. Florida State University is one example: A multi-year BI standardization project ended up saving the school about $350,000 in software license and maintenance fees as well as support costs, according to CIO Michael Barrett. But companies would be smart not to jump on the BI consolidation bandwagon without giving it a lot of thought first, cautioned Baseline Consulting’s Jill Dyche: “Shelfware is one thing, but I’d strongly advise you not to rip any type of valuable reporting or analytics capability out of the hands of an earnest and well-meaning business user.”

8. CPM’s horizons broaden, but usage remains relatively low
While most businesses are sold on using BI tools, CPM software is a completely different ballgame. Despite the technology’s potential benefits, CPM adoption levels remain low, according to analysts and our survey. But among the organizations that are using CPM tools, a growing number are looking to the software for help with more than just forecasting, budgeting and planning. In addition, on-demand CPM software is helping small and medium-sized businesses overcome barriers to adopting the technology.

7. Vendors push mobile BI – is anyone listening?
With almost everyone (and their pets) owning smartphones and more and more people buying iPads, BI vendors increasingly are pushing mobile BI software for use in accessing reports and executive dashboards on mobile devices. But mobile BI doesn’t appear to be a major priority for a lot of companies at this point. For example, only about 30% of the respondents to a survey conducted by consultant Howard Dresner said they were actively using mobile BI tools, and there was an almost even split on whether mobile BI is an important technology. The real value of mobile BI, according to Jill Dyche, “lies in the field, or in the stores, or on the manufacturing floor” – as a tool for end users who “need information on demand” in order to do their jobs.

6. Social media analytics enters the picture
As more and more people use social networking sites such as Facebook and Twitter, companies increasingly are turning to those sites to engage their customers and track what people think of – and are saying about – their products and services. And BI and analytics vendors are offering tools designed to help businesses mine and make sense of social media data. Last spring, for example, SAS unveiled a social media analytics suite for use in analyzing blog posts, tweets and Facebook status updates. But some analysts and BI professionals have questions about the functionality and maturity of the social media analytics software that’s currently available. For now, experienced users said, the key to social media analytics success for organizations that are pursuing the technology lies in commitment, experimentation – and patience.

5. A heavy layer of fog obscures visibility of agile BI
Agile business intelligence emerged as a much-discussed concept during 2010, but there’s still a lot of confusion about what agile BI really is. At a TDWI conference in August, some attendees thought it referred to applying agile development principles to their BI environments, others thought it meant the ability of BI to help an organization become more adaptable, and still more thought it was just another buzzword. Wayne Eckerson, then research director at TDWI and now head of research for TechTarget’s Enterprise Applications Media Group, thinks agile BI includes elements of all three of those viewpoints. It’s more of a mentality aimed at making businesses “go as fast as possible” than a specific methodology, Eckerson said. On the other hand, Dyche’s take is that “many companies are attracted to agile [BI] approaches because they don’t have the organizational discipline to instill solid BI development processes.” Ouch!

4. SaaS BI steps into the limelight
SaaS BI software has been around for years, but the cloud-based technology – which holds out the promise of faster deployments and reduced hardware and system management requirements – finally began stepping out of the BI shadows during 2010. SAP helped generate some of the buzz by releasing a new SaaS BI product suite that it claimed could bring “BI to the rest of us.” And companies such as Genband Inc. and Wine Management Systems are actively using SaaS BI and reporting tools to streamline the process of building reports for business users or to enable customers to create their own customized reports. Still, cloud-based BI might not be right for everyone; it’s important to know about and prepare for the challenges and obstacles that come with using SaaS BI software.

3. Pervasive BI, expanded use of tools become bigger BI priorities
A growing number of companies say that they’re looking to make their BI systems more pervasive by giving more businesses users access to the technology. In addition, more and more organizations are working to broaden their use of BI tools beyond basic reporting and data analysis. But efforts to expand the BI process can take years to complete because of data quality problems and other challenges. As mentioned above, SaaS and on-demand BI tools could help enable pervasive BI deployments; technologies such as data visualization, social media analytics and unstructured data analysis are also seen as having potential for spreading BI to more business users. But regardless of how businesses reach the pervasive peak, training end users is the key to successful pervasive BI projects, according to Rick Sherman, founder of consulting firm Athena IT Solutions. His reasoning is simple: People won’t use the technology if they don’t know how or why they should be using it.

2. Still the king: Excel continues its BI tool supremacy
Go to any BI or data warehousing conference and you’ll likely hear about the evils and data management disasters that come with all of the Excel-based “spreadmarts” that business users refuse to let go of. In fact, you might think that Excel is akin to the bubonic plague – and for a lot of businesses with poor spreadsheet management practices, you might be right. But according to Gartner analysts and attendees at the firm’s annual Business Intelligence Summit, it’s time for IT and BI managers to wave the white flag on using Excel for BI purposes. Their advice: Make your peace with spreadsheets and focus on developing processes for properly using Excel in BI projects. That was music to Microsoft’s ears, of course. Hoping to further capitalize on Excel’s continuing BI popularity, the software vendor released a PowerPivot for Excel add-in that lets end users integrate nearly unlimited amounts of data into their spreadsheets for analysis – although it also added a SharePoint version with management capabilities designed to help ease the collective minds of IT groups.

1. Interest in predictive analytics heats up
A relatively small number of the organizations that responded to the 2010 SearchBusinessAnalytics.com survey were using predictive analytics tools – just 16%. But 48% said that they planned to add predictive analytics software within the next 12 months, giving it the top spot on the analytics technology adoption list. Industry analysts also see predictive analytics as the next big battleground for BI vendors, which increasingly are developing or acquiring predictive analytics technology with the goal of incorporating it into their core platforms. In October, for example, IBM announced a new version of its Cognos BI software with predictive analytics capabilities built in. Thus far, many of the early adopters of predictive analytics are focusing not on wider market and economic trends but on individual customer analysis in an effort to understand what specific customers are likely to buy so that marketing campaigns and up-sell offers can be tailored to them.

Sunday, November 13, 2011

Sabermetrics

Sabermetrics is something I've become very interested in since I was over in America for a while. Sabermetrics is the analysis of baseball through statistics that measure in-game activity.



Since there is such a lot of data available, I've written loads of code to explore what factors are the key criteria to determine if a team will win or lose -- I have had very little success, but here's something interesting to think about:

2010 became known as the "Year of the Pitcher" as opposed to previous years where it was the batter who determined the outcome of the game. Commentators have suggested that it may be rigorous testing and penalties for performance-enhancing drug use as a possible factor for this. Runs per game fell to their lowest level in 18 years, and the strikeout rate was higher than it had been in half a century.

Thursday, November 10, 2011

The Literary Digest Survey


The Literary Digest is almost certainly best-remembered today for the circumstances surrounding its demise. It conducted a "straw poll" regarding the likely outcome of the 1936 presidential election. The poll showed that the Republican governor of Kansas, Alf Landon, would likely be the overwhelming winner. This seemed possible to some, as the Republicans had fared well in Maine, where the congressional and gubernatorial elections were then held in September - as opposed to the rest of the nation, where these elections were held in November along with the presidential election, like today. This seemed especially likely in light of the conventional wisdom, "As Maine goes, so goes the nation", a truism coined because Maine was regarded as a "bellwether" state which usually supported the winning candidate's party.

http://en.wikipedia.org/wiki/The_Literary_Digest
http://historymatters.gmu.edu/d/5168/

http://faculty.muhs.edu/kresovic/MT177/assignments/05/Squire.pdf