Machine Learning Update
I have been reading the book titled Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition by Aurélien Géron and taking classes on Coursera by Andrew Ng. The video teaches excellent concepts that are also shown in the book. I have found myself using the book to learn areas that were confusing in the videos.
The book starts you off in chapter 1 with the basics of machine learning. These concepts are the foundation for understanding what machine learning is and what algorithm types are available to use. I posted the basics types of algorithms in machine learning in the post
here.
Chapter 2 starts you off by using a Regression Algorithm to look at housing prices. Regression algorithms use the
Root Square Mean Error (RSME) to show how much error is in your predictions.
Test Data
The book goes into test data and why it is essential to set aside the data early. One area that I did not consider is the subconscious pattern recognition that people do when they look at data. The intent of putting aside test data early is to ensure that one does not unintentionally skew the results by expecting a particular prediction before the algorithm can be run.
The test data can be created using different methods. When considering how to set aside data for testing, you must ensure that you do not use too much data for training. You must avoid using too much data and causing "Overfitting." This is training too much of our model to our data. When we actually test the data, it shows high accuracy, but only for the data set that we started and likely will not work well with new data.
Experiences So Far
At this point, there have been a couple questions that have popped up that I am currently searching for answers.
1. Why do we use a virtual environment when using machine learning?
2. What is the actual computer capability required to do machine learning? AKA Do you just need powerful computers with more massive sets of data, or is the power necessary used for all algorithms?
The most significant area that I am working to understand is the math. I have been following along online and in books, and math has been more than what you would use in everyday life. I read that this was. A large part of machine learning; however, I haven't run across actual reasoning behind it. So far, it has explained how the models/algorithms run and use the data that I am working with; however, I haven't found the reasoning behind needing to understand the exact questions and how I can/if I should manipulate it.My first guess is that this is part of creating your own machine learning algorithms. If I understand the math, I should be able to tweak or change the parameters to meet my intent if it doesn't aline with the algorithms that are used by most systems.
A New Book!
I just received this book for Christmas and have decided to go through this book first before continuing the O'Reilly book. Based on what I have read online, this book should give me a broad understanding of the information before I get much deeper into machine learning in the O'Reilly book.