Tuesday, December 1, 2015

Engineering starts with a problem to solve.

Not a toolbox, not the know-how, not the big data everyone is crazy about, but a question mark, a dissatisfaction. All engineering starts with a problem to solve.

Reliability Engineering, or Risk Assessment is no exception. Once a friend told me reliability engineering is to support a design decision. If there choice/trade-off to be made, no reliability engineering. Forgetting this, it is easy to become paper pushing; following the procedure, generating the reports, but the design keeps going on its on track uninterested.

Wednesday, November 25, 2015

Rated R: Recommended Reading by Joseph Rickert

Via Revolution: sentences to ponder ... 
The fact that the dplyr family of packages may make data wrangling more convenient in many circumstances doesn't make a book that teaches data manipulation through base R functions any less relevant. In fact, some might argue that new students should be taught the basic functionally first. I am not a militant traditionalist, but it does seem to me that familiarity with the bare bones basics of the language will help newcomers to gain intuition about how R works.
Here is the list

Learning R
Advanced R by Hadley Wickham - Anyone who wants to gain a deep understanding of the R language will certainly benefit from this book. More than a reference: the author seeks to provide a conceptual framework for understanding R’s structure and guide readers through R’s idiosyncratic mechanisms pointing out traps, illuminating difficult concepts and providing expert commentary.
The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff – This isstill my pick for the best book for people with some programming experience who want to make a serious effort at learning R. Professor Matloff’s interest in teaching the mechanics of programming infused with his deep understanding of both the underlying computer science and statistical theory put this book on top.
Hands on Programming with R by Garrett Grolemund – If you are not only new to R but new to programming as well this is the book for you. I have review it more extensively here.
R For Dummies by Andrie de Vries and Joris Meys – A current, concise and insightful reference to core concepts in the R language. A really nice feature of the book is its emphasis on presenting the R ecosystem along with core R concepts. When learning anything new, it is always helpful to understand the big picture. Keep this book by your computer, when you stop referring to it you will be a pretty good R programmer.
Data Science with R
Applied Predictive Modeling by Max-Kuhn and Kjell Johnson – This book is the master text for predictive analytics, carefully walking through several modeling examples and making expert use of the extensive machine learning tools in R’s caret package. I have described the book more fully here.
Data Mining with Rattle and R by Graham Williams – This is the perfect first book for machine learning with R. The rattle GUI helps get across the machine learning concepts and also produces some pretty good R code to get your started.
Data Science in R: A Case Studies Approach to Computational Reasoning and Data Science by Deborah Nolan and Duncan Temple Lang. – My most recent acquisition, this book consists of 12, non-trivial case studies organized under three themes: Data Manipulation and modeling, Simulation Studies and Data and Web Technologies. All of the data sets are messy and the projects identify and develop the kind of skills required to undertake open-ended data science projects. The book doesn’t teach R programming, but it shows why R is the appropriate language for doing data science.
Practical Data Science with R by Nina Zumel and John Mount – This book is one of a kind. It moves fluidly between the various stages of the data science process from surface considerations of working with customers to the deep details of various machine learning algorithms. There is quite a bit of original R code that you can use in real projects. Most impressive is the statistical sensibility of the authors who want you to make correct inferences from your data and machine learning models as well as effectively   communicate your findings to the people paying the bills. 

Friday, October 23, 2015

Blogs for Data Analysis

Hattip to Quora

Here are some that I've found useful:

Friday, June 26, 2015

Why Chess Will Destroy Your Mind - from SciAm in 1859

Via The Message

Interesting reading of how people write about the video game of the time. The Scientific American essay is mostly right about chess, some of the complaints also easily fit into the debate about video games.

  • [C]hess is too sedentary a pasttime for people who were living increasingly industrialized and sedentary lives.
  • [C]hess-playing prowess doesn’t necessarily transfer to other domains.
  • [C]hess ensorcels its disciples into an addictive loop

However, the gem is here:

 Go too far in that direction and you wind up like Ahab in Moby Dick: Focused, sure, but also a total obsessive. This is precisely the perspective from which this Scientific American author denounces chess. Too much focus, too much devotion and sitting down, can be bad for you. Who’s to say that’s not a healthier balance?
Huh ... how about all the Meditation for Children books?

Friday, May 29, 2015

Correlation does not imply causation - sports department

Why Do Former High-School Athletes Make More Money? - The Atlantic:

I don't have a kid in high school, but I know extracurricular requires commitment from parents, at least for elementary school years, which I know first hand. By accident, committed parents matters in education ...

We parents joke about that too, no matter it is baseball, basketball, swimming or scouts, we almost always meet the same bunch of parents, from one activities to another. So my hunch is that the extracurriculars at elementary school years is good indicator of commitment of the parents. At the same time, early start in sports would lead to higher participation in later years.

To measure how much kids "acquired" from sports, we need to control for other factors.

Thursday, May 28, 2015

R vs. Python, round #???

An R Enthusiast Goes Pythonic! | Data Until I Die!:

I found myself stumbling a lot trying to figure out which Python packages to use for each particular purpose and I tended to get easily frustrated. I had to keep reminding myself that it’s a learning curve to a similar extent as it was for me while I was learning R. This frustration should not be a deterrent from picking it up and learning how to do machine learning in Python.

Thursday, May 21, 2015

Game of Thrones - What's the worst thing that could happen?

The quote from Vox is right on:

"This, honestly, is a problem endemic to the show, and one that may eventually tear it down. On Game of Thrones, suffering isn't something characters go through; it's something the writers visit on the characters. It often seems as if the show starts from the premise of, What's the worst thing that could happen to this person? and then presses that button over and over again."
The show has not gone deep into the character as it could have. It makes me feel that the writers care more about the response from the viewers than the characters or the story. It is a tool the awe the viewers. It is a way to make money, not exactly a piece of art.