Road Map to Statistical analysis

Vishal Sinha
April 12, 2014

0 Shares

STATISTICAL ANALYSIS 1: ESTIMATION AND TESTING

This main focus of this course is employing samples to make inferences about certain statistical properties of the underlying population such as means, variances and proportions. The students will learn how to construct range of estimates and how to test hypothesis for these statistics. Students will also learn be exposed to differences between data collected from experiments vs. observational studies. This methodology will also be extended to the comparison of multiple populations using ANOVA. Students will also learn crucial differences between data collected from experiments and those collected from observational studies and their implications for the use of statistical methods. The course will use applications from various functional areas of business to illustrate these concepts.

DATA MANAGEMENT 1

Databases and Database Users, Database System Concepts

Data Modeling using Entity Relationship Model

Relational Data Model and Relational Database Constraints

Relational Database Design

SQL-99: Schema Definition, Constraints, Queries and Views, SQL Programming Techniques

Data Warehouse : concepts & ETL

Modeling Data Warehouses

Data Cubes, operations and query exploration

Applications.

DATA COLLECTION

Introduction – sources of data

Types of data – structured, unstructured, qualitative, quantitative

Database architecture and data gathering process

Overview of an online survey/research project – Phases, purpose, problem statement, conceptualization, execution, measurement

Sampling – Sampling terminology, Kinds of sampling, Sample size, Sampling and non sampling errors, Sample table of random numbers

Data collection – various kinds of data and secondary data

Using registration and other non invasive data collection methods

Assignment – Online survey

Building a click-stream database

Using Web crawlers and Bots

Using API’s to access third party data

Text mining and semantic web techniques

Handling missing data and deriving data attributes

Documenting data and building a metadata layer

case study.

STATISTICAL ANALYSIS 2: REGRESSION MODELS

Correlation and Regression

Role of regression in analytics

Linear regression, assumptions, Inference using least squares method, Need for diagnostics, Collinearity, Dummy variables, heteroscadasticity, autocorrelation, Influential observations, Subset selection, transformations, Steps involved in regression modeling, Case study. Software used is R.

OPERATIONS 1: SIMULATION

Simulation is the process of designing a model of a system, and conducting experiments to understand the behavior of the system. It could also be use to evaluate various strategies trying “what if” scenarios in an uncertain environment. Modeling & Simulation has become an important tool in several functional areas in business including finance, marketing, operations management, organizational behavior, and strategy. It can be used within all life cycle phases of a project, including requirements analysis, concept exploration & evaluation, design & development, integration and test & evaluation, and production & sustainment.

The goal of this course is to introduce participants to the principles of simulation and how such models are developed and used in various practical and functional areas. The course will cover Monte Carlo and Discrete Event simulation and related application areas.

DATA VISUALIZATION

Beauty of Data Visualization – what and why

Design concepts – Line charts, Area graphs, etc

Data exploration and Interactive dashboards

Visualization in a multi-device world – using space effectively

Creating meaning with data – Excel and PowerPoint visualization

Time dimension in data visualization

Assignment 1 presentation – using excel and PowerPoint in visualization

Advanced Visualization tools (OLAP, Tableau, Spotfire, Qlikview, etc)

Data gathering and data staging for visualization

Using visualization to build data trust

Impacting corporate culture using data visualization and collaborative analysis

Text visualization – tag clouds, keyword weighting, word tree, etc

Social data analysis

Non-traditional and statistical visualization.

STATISTICAL ANALYSIS 3: ADVANCED STATISTICAL MODELS

Regression Models for Count Data: Generalized Linear Models: Binary and multinomial logistic regressions, Poisson regression, Zero-inflated Poisson regression, Negative Binomial regression

Survival analysis: Introduction: Censoring and truncation

Characteristics of survival analysis data: Time-to-event data. Hazard and survival functions

Kaplan-Meier estimate of survival function

Cox proportional hazards model (ph), estimation and its analysis

Extensions

Stratified ph

ph with time-varying covariates

Parametric survival analysis with standard distributions

Accelerated failure time models

Business applications: Customer lifetime estimation.

Design of experiments: Basic concepts: randomization, replication and control

Experimental design for testing differences in several means: Completely randomized and randomized complete block designs

Cross-over designs

Two-level factorial experiments—full and fractional

Plackett-Burman designs

Designs for three or more levels. Taguchi designs. Response surface designs

Business applications: Case-Control designs for campaign evaluation

Designs for conjoint analysis.

Missing value analysis: Missing value patterns: Missing completely at random (MCAR)

Missing at random (MAR)

Missing not at random (MNAR)

List-wise deletion

Pair-wise deletion

Various imputation methods: Hot deck imputation, Mean substitution, Regression imputation, EM imputation

FORECASTING ANALYTICS

Regression and time series paradigms of forecasting, visualization and exploration of regression data, forecasting based on regression models, visualization and exploration of time series data, forecasting based on time series models, evaluating forecast performance, neural networks, introduction to advanced models, forecasting binary variables, use of forecasting software.

DATA MINING 1: SUPERVISED LEARNING

Classification and prediction

Bayes classification: error probability,

Data partitioning and performance evaluation, training set and test set errors, cross-validation

Variable and feature selection

Classification Methods: Discriminant analysis, Nonparametric Density based methods, Naïve Bayes,

K-nearest neighbors, Neural nets, Classification trees, Support Vector Machines

Ensemble methods, bagging & boosting

Using Data Mining Software (XL miner or R)

DATA MANAGEMENT 2: BIG DATA

The basics of Big Data analytics – what it is? Why is it needed? Real-world applications

The fundamentals of the MapReduce programming model to crunch and analyze Big Data and hands-on experience on using Hadoop

Big Data Text Analytics for understanding and mining large volumes of unstructured text data,

Big Data Visualization for finding global trends and local structures in Big Data.

CONTEMPORARY ANALYTICS 1

Social media: Introduction, History of Social media, Basics of Social Media and Business Models, Basics of Web Search Engines and Digital Advertising.

Digital analytics and digital attribution: Web analytics, Experimental methods in web data analytics, Econometric modeling of search engine ads.

User generated content and social listening: Sentiment Analysis, Word of Mouth, Text Mining of User Generated Content.

Online communities and Social networks I: Measuring the Impact of Social Networks

Online communities and Social networks II: Facebook Insights Data Analysis, Social Media and Viral Marketing, Using STATA

Collective Intelligence and Social Media: Harnessing the Wisdom of Crowds, Contests and Communities, Crowd-sourcing, Crowd-funding

Mobile: Mobile ecosystem ,Use of Technology for E-commerce: Impact of IT interventions in web site design on E-commerce

OPERATIONS 2: OPTIMIZATION

Optimization (finding “what’s best” from the available options) and decision analysis (deciding the “what now” in the sense of what we should do given the information we had in the past). The emphasis is on models that are widely used in diverse industries and functional areas, including operations, finance and marketing. The course will introduce deterministic constrained optimization, network optimization, stochastic models, non-linear optimization.

CONTEMPORARY ANALYTICS 2 (OR ANALYTICS LEADERSHIP)

Customer Analytics for New Product Development: Segmentation Analytics: Cluster Analysis

Customer Targeting Analytics: Discriminant Analysis and GE Portfolio Matrix: Market Positioning Analytics: Multidimensional Scaling and Factor Analysis

Product/Service Design Analytics and optimal market offering: Conjoint Analysis.

DATA MINING 2: UNSUPERVISED LEARNING

Principal components

Canonical correlations

Measuring data similarity and dissimilarity

Mining Frequent Patterns – association rules

Pattern Mining

Clustering Methods

Clustering High Dimensional Data

Outlier Detection

Applications

Spatio-temporal, recommendation systems. Lab work on SQL, SQL Programming and Data Warehouse operations

ACTIVE LEARNING PROJECTS (ALP)

Choose two projects from list of topics (depending on faculty offering).

Sample list:

Marketing Analytics (product positioning, brand equity assessment, consumer purchasing behavior, the effectiveness of marketing campaigns), Risk Analytics (portfolio analysis,…), Search Analytics, Online Advertising Analytics, Fraud Detection

0 Shares

Road Map to Statistical analysis

DOWNLOAD MY 1-PAGE "PRODUCTIVITY BLUEPRINT" & VIDEO LESSONS. LEARN HOW TO SET UP A BASIC SYSTEM FOR STAYING ORGANISED!

Products

CONSULTING

RESOURCES

Weekly Newsletter

DOWNLOAD MY 1-PAGE "PRODUCTIVITY BLUEPRINT" & VIDEO LESSONS. LEARN HOW TO SET UP A BASIC SYSTEM FOR STAYING ORGANISED!

Products

CONSULTING

RESOURCES

Weekly Newsletter