An intro post of my way to data scientist, way-to-ds-ep01


Find Tribes you belong

Tribes of ML

Pick a Systematic Process Applied For Predictive Problems

5-step process:

  1. Define the Problem: formalize question using Task(T), Experience(E) and Performance(P): if its performance at tasks in T, as measured by P, improves with experience E
  2. Prepare Data: 1), collect data; anything can be INPUT, OUTPUT -> raw data 2), process data; raw data as INPUT, OUTPUT -> structured data 3), transform data; structured data as INPUT, OUTPUT -> featured data with feature Scaling, decomposition and aggregation
  3. Spot Check Algorithms run a bunch of standard algorithms to spot out which algorithms are better to focus on and more related to the problem we defined ahead. Normally we do train/test validation, cross validation via 5 or 10 Folds.
  4. Improve Results 1) Algorithm Tuning, 2) Extreme Feature Engineering
  5. Present Results: Conclusions (Why+Question+Answer)

Pick a tool

  1. Orange/Weka software for basic operations
  2. Programming using python to do more

Do more and more real practices.