What is BA?

Skills, Tech & Practices for Continuous Iterative Exploration & Investigation to gain insights and drive biz planning
Data → Insights → Biz Planning

Types
 Descriptive
 Insights about past
 Summarise raw data to make interpretable
 Predictive
 Understand the future
 Provide actionable insights based on past data
 Prescriptive
 Advise possible outcomes
 Attempt to quantify effect of future decisions on outcomes
Descriptive Biz Analytics
 Randomness, population and samples
 Data  Categorical or Numerical
 Sample  Random, Systematic, Stratified or Clustered
 Stats
 Central Tendency: Mean, Mode, Median, Midrange, Quartiles
 Dispersion: Range, Var, Std Dev, Interquart Range, Coeff of Var
 Shape: Skew or Dispersion
 Exploratory: Five number summary, boxandwhiskers
 Data Visualisation
 Presentation of data visually to amplify cognition
 Tool: Tableau
 Designing Data Viz Products: Design → Paper&Pencil → Execution
 Extra
 Trend Lines, Forecasting, Clustering, WhatIf Analysis
Predictive Biz Analytics / Machine Learning
 ML is subset of AI
 What Is:
 Transforming data into knowledge to produce actionable insights
 Gives computers ability to learn without being explicitly programmed
 ML Lifecycle
 Define Goals
 Specify Biz Prob
 Define unit of analysis, prediction target
 Prioritise model criteria
 Data Prep
 Find appropriate data
 Merge data in a single table
 Explore data
 Clean data
 Feature engineering
 Create Model
 Interpret Model
 Implement Model
 Model Types (step 35)
 Supervised
 Numeric Prediction → Regression
 Regression Types
 Linear Regression
 Univariate Linear Reg
 Multivariate Linear Reg
 NonLinear Reg
 Decision Tree
 Built top down from root (contains all instances) to leaf (prediction with smaller SD)
 Entropy = homogeneity of data, Std Dev (0 = homo)
 Pros
 Both Reg and Classification
 Easy to Interpret
 Gens Biz Knowledge
 Cons
 Prone to overfitting
 Too sensitive to instances, does not generalise well
 Random Forest
 Set of Decision trees
 Split training dataset to train diff models, combine at end
 Deep Learning
 Based on large neural networks
 Learn by example (training set shows examples, connections are made automatically)
 Ensembles
 Superset of random forest
 Set of any number and type of models
 Evaluation
 Mean Absolute Error
 Mean Squared Error
 RSquared
 Categorical Prediction → Classification
 Models
 Logistic Regression
 Instead of fit categorical var, we predict probabilities of each category
 Decision Tree
 See above (Regression>DecisionTree)
 Random Forest
 See above (Regression>RandomForest)
 Deep Learning
 See above (Regression>DeepLearning)
 Ensembles
 See above (Regression>Ensembles)
 Evaluation
 Confusion Matrix
 True Positive (TP)  p classified p
 False Positive (FP)  n classified p
 True Negative (TN)  n classified n
 False Negative (FN)  p classified n
 Calculate measures based on confusion matrix  Eg. FPCost(FP)(Cost(FN)
 Accuracy = (TP+TN)/Total
 Precision = TP / (TP+FP)
 Recall = TP/(TP+FN)
 FMeasure = (2PrecisionRecall)/(Precision+Recall)
 Phi Coefficient
 Unsupervised
 Training data has examples, not outcome
 Goal  discovery and pattern finding
 Not able to be evaluated
 Algorithms
 Clustering
 Finds "selfsimilar" groups of instances
 Centroid: Geometric center of group
 How: Specify or determine centroids
 Beware scaling → Normalise
 Anomaly Detection
 Association Rules
 Topic Modeling