The Python conference PyCon2014 has held recently and the videos for the conference are online. I have been working my way through the interesting machine learning ones and will share a few on this over the coming weeks.A great talk if you are starting out in data science or machine learning in python was given by Melanie Warrick titled How to Get Started with Machine Learning . It’s about 25 minutes long. The abstract of the talk is:
Provide an introduction to machine learning to clarify what it is, what it’s not and how it fits into this picture of all the hot topics around data analytics and big data.
Computers…ability to learn without… explicit programming
She positions machine learning as the toolkit used in Artificial Intelligence and Data Science. Relatedly, she describes big data as data beyond the ability of common technology to capture and curate. This definition sits well with me. Although the talk is an introduction to machine learning, the focus is on the application of machine learning in data science.
She describes the four main data science roles as data lead, data creative, data developer and data researcher and uses a graph to indicate the amount of machine learning performed by each role. She also describes a data science project workflow.
Data Science Project Flow by Melanie Warrick.
She provides a cute example of linear regression on a 2d dataset (head size vs brain weight) usingscikit-learn. Usefully, she summarizes Python tools in categories:
- Explore data : pandas, statsmodels, matplotlib, numpy, unix
- Build model : scikit-learn, numpy, pandas, scipy
- Test model : scikit-learn, matplotlib
- Data products : API, Flask, Django
- Visualize : D3, Matplotplib, vincent and vega, ggplot