Road Map to Statistical analysis

STATISTICAL ANALYSIS 1: ESTIMATION AND TESTING
This main focus of this course is employing samples to make inferences about certain statistical properties of the underlying population such as means, variances and proportions. The students will learn how to construct range of estimates and how to test hypothesis for these statistics. Students will also learn be exposed to differences between data collected from experiments vs. observational studies. This methodology will also be extended to the comparison of multiple populations using ANOVA. Students will also learn crucial differences between data collected from experiments and those collected from observational studies and their implications for the use of statistical methods. The course will use applications from various functional areas of business to illustrate these concepts.
DATA MANAGEMENT 1
Databases and Database Users, Database System Concepts
Data Modeling using Entity Relationship Model
Relational Data Model and Relational Database Constraints
Relational Database Design
SQL-99: Schema Definition, Constraints, Queries and Views, SQL Programming Techniques
Data Warehouse : concepts & ETL
Modeling Data Warehouses
Data Cubes, operations and query exploration
Applications.
DATA COLLECTION
Introduction – sources of data
Types of data – structured, unstructured, qualitative, quantitative
Database architecture and data gathering process
Overview of an online survey/research project – Phases, purpose, problem statement, conceptualization, execution, measurement
Sampling – Sampling terminology, Kinds of sampling, Sample size, Sampling and non sampling errors, Sample table of random numbers
Data collection – various kinds of data and secondary data
Using registration and other non invasive data collection methods
Assignment – Online survey
Building a click-stream database
Using Web crawlers and Bots
Using API’s to access third party data
Text mining and semantic web techniques
Handling missing data and deriving data attributes
Documenting data and building a metadata layer
case study.
STATISTICAL ANALYSIS 2: REGRESSION MODELS
Correlation and Regression
Role of regression in analytics
Linear regression, assumptions, Inference using least squares method, Need for diagnostics, Collinearity, Dummy variables, heteroscadasticity, autocorrelation, Influential observations, Subset selection, transformations, Steps involved in regression modeling, Case study. Software used is R.
OPERATIONS 1: SIMULATION
Simulation is the process of designing a model of a system, and conducting experiments to understand the behavior of the system.  It could also be use to evaluate various strategies trying “what if” scenarios in an uncertain environment. Modeling & Simulation has become an important tool in several functional areas in business including finance, marketing, operations management, organizational behavior, and strategy.  It can be used within all life cycle phases of a project, including requirements analysis, concept exploration & evaluation, design & development, integration and test & evaluation, and production & sustainment.
The goal of this course is to introduce participants to the principles of simulation and how such models are developed and used in various practical and functional areas.  The course will cover Monte Carlo and Discrete Event simulation and related application areas.
DATA VISUALIZATION
Beauty of Data Visualization – what and why
Design concepts – Line charts, Area graphs, etc
Data exploration and Interactive dashboards
Visualization in a multi-device world – using space effectively
Creating meaning with data – Excel and PowerPoint visualization
Time dimension in data visualization
Assignment 1 presentation – using excel and PowerPoint in visualization
Advanced Visualization tools (OLAP, Tableau, Spotfire, Qlikview, etc)
Data gathering and data staging for visualization
Using visualization to build data trust
Impacting corporate culture using data visualization and collaborative analysis
Text visualization – tag clouds, keyword weighting, word tree, etc
Social data analysis
Non-traditional and statistical visualization.
STATISTICAL ANALYSIS 3: ADVANCED STATISTICAL MODELS
Regression Models for Count Data: Generalized Linear Models: Binary and multinomial logistic regressions, Poisson regression,  Zero-inflated Poisson regression, Negative Binomial regression
Survival analysis: Introduction: Censoring and truncation
Characteristics of survival analysis data: Time-to-event data. Hazard and survival functions
Kaplan-Meier estimate of survival function
Cox proportional hazards model (ph), estimation and its analysis
Extensions
Stratified ph
ph with time-varying covariates
Parametric survival analysis with standard distributions
Accelerated failure time models
Business applications: Customer lifetime estimation.
Design of experiments: Basic concepts: randomization, replication and control
Experimental design for testing differences in several means: Completely randomized and randomized complete block designs
Cross-over designs
Two-level factorial experiments—full and fractional
Plackett-Burman designs
Designs for three or more levels.  Taguchi designs.  Response surface designs
Business applications: Case-Control designs for campaign evaluation
Designs for conjoint analysis.
Missing value analysis: Missing value patterns: Missing completely at random (MCAR)
Missing at random (MAR)
Missing not at random (MNAR)
List-wise deletion
Pair-wise deletion
Various imputation methods:  Hot deck imputation,  Mean substitution,  Regression imputation,  EM imputation
FORECASTING ANALYTICS
Regression and time series paradigms of forecasting, visualization and exploration of regression data, forecasting based on regression models, visualization and exploration of time series data, forecasting based on time series models, evaluating forecast performance, neural networks, introduction to advanced models, forecasting binary variables, use of forecasting software.
DATA MINING 1: SUPERVISED LEARNING
Classification and prediction
 Bayes classification: error probability,
Data partitioning and performance evaluation, training set and test set errors, cross-validation
Variable and feature selection
Classification Methods: Discriminant analysis, Nonparametric Density based methods, Naïve Bayes,
K-nearest neighbors, Neural nets, Classification trees, Support Vector Machines
Ensemble methods, bagging & boosting
Using Data Mining Software (XL miner or R)
DATA MANAGEMENT 2: BIG DATA
The basics of Big Data analytics – what it is? Why is it needed? Real-world applications
The fundamentals of the MapReduce programming model to crunch and analyze Big Data and hands-on experience on using Hadoop
Big Data Text Analytics for understanding and mining large volumes of unstructured text data,
Big Data Visualization for finding global trends and local structures in Big Data.
CONTEMPORARY ANALYTICS 1
Social media: Introduction, History of Social media, Basics of Social Media and Business Models, Basics of Web Search Engines and Digital Advertising.
Digital analytics and digital attribution: Web analytics, Experimental methods in web data analytics, Econometric modeling of search engine ads.
User generated content and social listening: Sentiment Analysis, Word of Mouth, Text Mining of User Generated Content.
Online communities and Social networks I: Measuring the Impact of Social Networks
Online communities and Social networks II: Facebook Insights Data Analysis, Social Media and Viral Marketing,   Using  STATA
Collective Intelligence and Social Media: Harnessing the Wisdom of Crowds, Contests and Communities, Crowd-sourcing, Crowd-funding
Mobile: Mobile ecosystem ,Use of Technology for E-commerce: Impact of IT interventions in web site design on E-commerce
OPERATIONS 2: OPTIMIZATION
Optimization (finding “what’s best” from the available options) and decision analysis (deciding the “what now” in the sense of what we should do given the information we had in the past).  The emphasis is on models that are widely used in diverse industries and functional areas, including operations, finance and marketing.   The course will introduce deterministic constrained optimization, network optimization, stochastic models, non-linear optimization.
CONTEMPORARY ANALYTICS 2 (OR ANALYTICS LEADERSHIP)
Customer Analytics for New Product Development: Segmentation Analytics:  Cluster Analysis
Customer Targeting Analytics: Discriminant Analysis and GE Portfolio Matrix: Market Positioning Analytics: Multidimensional Scaling and Factor Analysis
Product/Service Design Analytics and optimal market offering: Conjoint Analysis.
DATA MINING 2: UNSUPERVISED LEARNING
Principal components
Canonical correlations
Measuring data similarity and dissimilarity
Mining Frequent Patterns – association rules
Pattern Mining
Clustering Methods
Clustering High Dimensional Data
Outlier Detection
Applications
Spatio-temporal, recommendation systems. Lab work on SQL, SQL Programming and Data Warehouse operations
ACTIVE LEARNING PROJECTS (ALP)
Choose two projects from list of topics (depending on faculty offering).
Sample list:
Marketing Analytics (product positioning, brand equity assessment, consumer purchasing behavior, the effectiveness of marketing campaigns), Risk Analytics (portfolio analysis,…), Search Analytics, Online Advertising Analytics, Fraud Detection