STATISTICAL ANALYSIS 1: ESTIMATION AND TESTING |
This main focus of this course is employing samples to make inferences about certain statistical properties of the underlying population such as means, variances and proportions. The students will learn how to construct range of estimates and how to test hypothesis for these statistics. Students will also learn be exposed to differences between data collected from experiments vs. observational studies. This methodology will also be extended to the comparison of multiple populations using ANOVA. Students will also learn crucial differences between data collected from experiments and those collected from observational studies and their implications for the use of statistical methods. The course will use applications from various functional areas of business to illustrate these concepts. |
DATA MANAGEMENT 1 |
Databases and Database Users, Database System Concepts |
Data Modeling using Entity Relationship Model |
Relational Data Model and Relational Database Constraints |
Relational Database Design |
SQL-99: Schema Definition, Constraints, Queries and Views, SQL Programming Techniques |
Data Warehouse : concepts & ETL |
Modeling Data Warehouses |
Data Cubes, operations and query exploration |
Applications. |
DATA COLLECTION |
Introduction – sources of data |
Types of data – structured, unstructured, qualitative, quantitative |
Database architecture and data gathering process |
Overview of an online survey/research project – Phases, purpose, problem statement, conceptualization, execution, measurement |
Sampling – Sampling terminology, Kinds of sampling, Sample size, Sampling and non sampling errors, Sample table of random numbers |
Data collection – various kinds of data and secondary data |
Using registration and other non invasive data collection methods |
Assignment – Online survey |
Building a click-stream database |
Using Web crawlers and Bots |
Using API’s to access third party data |
Text mining and semantic web techniques |
Handling missing data and deriving data attributes |
Documenting data and building a metadata layer |
case study. |
STATISTICAL ANALYSIS 2: REGRESSION MODELS |
Correlation and Regression |
Role of regression in analytics |
Linear regression, assumptions, Inference using least squares method, Need for diagnostics, Collinearity, Dummy variables, heteroscadasticity, autocorrelation, Influential observations, Subset selection, transformations, Steps involved in regression modeling, Case study. Software used is R. |
OPERATIONS 1: SIMULATION |
Simulation is the process of designing a model of a system, and conducting experiments to understand the behavior of the system. It could also be use to evaluate various strategies trying “what if” scenarios in an uncertain environment. Modeling & Simulation has become an important tool in several functional areas in business including finance, marketing, operations management, organizational behavior, and strategy. It can be used within all life cycle phases of a project, including requirements analysis, concept exploration & evaluation, design & development, integration and test & evaluation, and production & sustainment. |
The goal of this course is to introduce participants to the principles of simulation and how such models are developed and used in various practical and functional areas. The course will cover Monte Carlo and Discrete Event simulation and related application areas. |
DATA VISUALIZATION |
Beauty of Data Visualization – what and why |
Design concepts – Line charts, Area graphs, etc |
Data exploration and Interactive dashboards |
Visualization in a multi-device world – using space effectively |
Creating meaning with data – Excel and PowerPoint visualization |
Time dimension in data visualization |
Assignment 1 presentation – using excel and PowerPoint in visualization |
Advanced Visualization tools (OLAP, Tableau, Spotfire, Qlikview, etc) |
Data gathering and data staging for visualization |
Using visualization to build data trust |
Impacting corporate culture using data visualization and collaborative analysis |
Text visualization – tag clouds, keyword weighting, word tree, etc |
Social data analysis |
Non-traditional and statistical visualization. |
STATISTICAL ANALYSIS 3: ADVANCED STATISTICAL MODELS |
Regression Models for Count Data: Generalized Linear Models: Binary and multinomial logistic regressions, Poisson regression, Zero-inflated Poisson regression, Negative Binomial regression |
Survival analysis: Introduction: Censoring and truncation |
Characteristics of survival analysis data: Time-to-event data. Hazard and survival functions |
Kaplan-Meier estimate of survival function |
Cox proportional hazards model (ph), estimation and its analysis |
Extensions |
Stratified ph |
ph with time-varying covariates |
Parametric survival analysis with standard distributions |
Accelerated failure time models |
Business applications: Customer lifetime estimation. |
Design of experiments: Basic concepts: randomization, replication and control |
Experimental design for testing differences in several means: Completely randomized and randomized complete block designs |
Cross-over designs |
Two-level factorial experiments—full and fractional |
Plackett-Burman designs |
Designs for three or more levels. Taguchi designs. Response surface designs |
Business applications: Case-Control designs for campaign evaluation |
Designs for conjoint analysis. |
Missing value analysis: Missing value patterns: Missing completely at random (MCAR) |
Missing at random (MAR) |
Missing not at random (MNAR) |
List-wise deletion |
Pair-wise deletion |
Various imputation methods: Hot deck imputation, Mean substitution, Regression imputation, EM imputation |
FORECASTING ANALYTICS |
Regression and time series paradigms of forecasting, visualization and exploration of regression data, forecasting based on regression models, visualization and exploration of time series data, forecasting based on time series models, evaluating forecast performance, neural networks, introduction to advanced models, forecasting binary variables, use of forecasting software. |
DATA MINING 1: SUPERVISED LEARNING |
Classification and prediction |
Bayes classification: error probability, |
Data partitioning and performance evaluation, training set and test set errors, cross-validation |
Variable and feature selection |
Classification Methods: Discriminant analysis, Nonparametric Density based methods, Naïve Bayes, |
K-nearest neighbors, Neural nets, Classification trees, Support Vector Machines |
Ensemble methods, bagging & boosting |
Using Data Mining Software (XL miner or R) |
DATA MANAGEMENT 2: BIG DATA |
The basics of Big Data analytics – what it is? Why is it needed? Real-world applications |
The fundamentals of the MapReduce programming model to crunch and analyze Big Data and hands-on experience on using Hadoop |
Big Data Text Analytics for understanding and mining large volumes of unstructured text data, |
Big Data Visualization for finding global trends and local structures in Big Data. |
CONTEMPORARY ANALYTICS 1 |
Social media: Introduction, History of Social media, Basics of Social Media and Business Models, Basics of Web Search Engines and Digital Advertising. |
Digital analytics and digital attribution: Web analytics, Experimental methods in web data analytics, Econometric modeling of search engine ads. |
User generated content and social listening: Sentiment Analysis, Word of Mouth, Text Mining of User Generated Content. |
Online communities and Social networks I: Measuring the Impact of Social Networks |
Online communities and Social networks II: Facebook Insights Data Analysis, Social Media and Viral Marketing, Using STATA |
Collective Intelligence and Social Media: Harnessing the Wisdom of Crowds, Contests and Communities, Crowd-sourcing, Crowd-funding |
Mobile: Mobile ecosystem ,Use of Technology for E-commerce: Impact of IT interventions in web site design on E-commerce |
OPERATIONS 2: OPTIMIZATION |
Optimization (finding “what’s best” from the available options) and decision analysis (deciding the “what now” in the sense of what we should do given the information we had in the past). The emphasis is on models that are widely used in diverse industries and functional areas, including operations, finance and marketing. The course will introduce deterministic constrained optimization, network optimization, stochastic models, non-linear optimization. |
CONTEMPORARY ANALYTICS 2 (OR ANALYTICS LEADERSHIP) |
Customer Analytics for New Product Development: Segmentation Analytics: Cluster Analysis |
Customer Targeting Analytics: Discriminant Analysis and GE Portfolio Matrix: Market Positioning Analytics: Multidimensional Scaling and Factor Analysis |
Product/Service Design Analytics and optimal market offering: Conjoint Analysis. |
DATA MINING 2: UNSUPERVISED LEARNING |
Principal components |
Canonical correlations |
Measuring data similarity and dissimilarity |
Mining Frequent Patterns – association rules |
Pattern Mining |
Clustering Methods |
Clustering High Dimensional Data |
Outlier Detection |
Applications |
Spatio-temporal, recommendation systems. Lab work on SQL, SQL Programming and Data Warehouse operations |
ACTIVE LEARNING PROJECTS (ALP) |
Choose two projects from list of topics (depending on faculty offering). |
Sample list: |
Marketing Analytics (product positioning, brand equity assessment, consumer purchasing behavior, the effectiveness of marketing campaigns), Risk Analytics (portfolio analysis,…), Search Analytics, Online Advertising Analytics, Fraud Detection |