What is BA?
-
Skills, Tech & Practices for Continuous Iterative Exploration & Investigation to gain insights and drive biz planning
Data → Insights → Biz Planning
-
Types
- Descriptive
- Insights about past
- Summarise raw data to make interpretable
- Predictive
- Understand the future
- Provide actionable insights based on past data
- Prescriptive
- Advise possible outcomes
- Attempt to quantify effect of future decisions on outcomes
Descriptive Biz Analytics
- Randomness, population and samples
- Data - Categorical or Numerical
- Sample - Random, Systematic, Stratified or Clustered
- Stats
- Central Tendency: Mean, Mode, Median, Midrange, Quartiles
- Dispersion: Range, Var, Std Dev, Interquart Range, Coeff of Var
- Shape: Skew or Dispersion
- Exploratory: Five number summary, box-and-whiskers
- Data Visualisation
- Presentation of data visually to amplify cognition
- Tool: Tableau
- Designing Data Viz Products: Design → Paper&Pencil → Execution
- Extra
- Trend Lines, Forecasting, Clustering, What-If Analysis
Predictive Biz Analytics / Machine Learning
- ML is subset of AI
- What Is:
- Transforming data into knowledge to produce actionable insights
- Gives computers ability to learn without being explicitly programmed
- ML Lifecycle
- Define Goals
- Specify Biz Prob
- Define unit of analysis, prediction target
- Prioritise model criteria
- Data Prep
- Find appropriate data
- Merge data in a single table
- Explore data
- Clean data
- Feature engineering
- Create Model
- Interpret Model
- Implement Model
- Model Types (step 3-5)
- Supervised
- Numeric Prediction → Regression
- Regression Types
- Linear Regression
- Univariate Linear Reg
- Multivariate Linear Reg
- Non-Linear Reg
- Decision Tree
- Built top down from root (contains all instances) to leaf (prediction with smaller SD)
- Entropy = homogeneity of data, Std Dev (0 = homo)
- Pros
- Both Reg and Classification
- Easy to Interpret
- Gens Biz Knowledge
- Cons
- Prone to overfitting
- Too sensitive to instances, does not generalise well
- Random Forest
- Set of Decision trees
- Split training dataset to train diff models, combine at end
- Deep Learning
- Based on large neural networks
- Learn by example (training set shows examples, connections are made automatically)
- Ensembles
- Superset of random forest
- Set of any number and type of models
- Evaluation
- Mean Absolute Error
- Mean Squared Error
- R-Squared
- Categorical Prediction → Classification
- Models
- Logistic Regression
- Instead of fit categorical var, we predict probabilities of each category
- Decision Tree
- See above (Regression>DecisionTree)
- Random Forest
- See above (Regression>RandomForest)
- Deep Learning
- See above (Regression>DeepLearning)
- Ensembles
- See above (Regression>Ensembles)
- Evaluation
- Confusion Matrix
- True Positive (TP) - p classified p
- False Positive (FP) - n classified p
- True Negative (TN) - n classified n
- False Negative (FN) - p classified n
- Calculate measures based on confusion matrix - Eg. FPCost(FP)(Cost(FN)
- Accuracy = (TP+TN)/Total
- Precision = TP / (TP+FP)
- Recall = TP/(TP+FN)
- F-Measure = (2PrecisionRecall)/(Precision+Recall)
- Phi Coefficient
- Unsupervised
- Training data has examples, not outcome
- Goal - discovery and pattern finding
- Not able to be evaluated
- Algorithms
- Clustering
- Finds "self-similar" groups of instances
- Centroid: Geometric center of group
- How: Specify or determine centroids
- Beware scaling → Normalise
- Anomaly Detection
- Association Rules
- Topic Modeling