in Foto Friday, Photography

Foto Friday – Pika Rocky Mountain National Park

Pika Rocky Mountain National Park

While in Rocky Mountain National Park last year, I stumbled upon an area perfect for Pikas. I sat myself down next to some rocks and waited. After about 15 minutes I started hearing the ‘squeeks’ that you’d hear from these cute little animals. Quickly thereafter, I started seeing the scurrying around and spend about an hour grabbing photos of them.   The below is one of those cute Pikas.

About Pikas:

A key characteristic of the American pika is its temperature sensitivity; death can occur after brief exposures to ambient temperatures greater than 77.9 °F.  Therefore, the range of the species progressively increases with elevation in the southern extents of its distribution.  In Canada, populations occur from sea level to 9,842 feet, but in New Mexico, Nevada, and southern California, populations rarely exist below 8,202 feet.

You can learn more about this great little animals here.

See more photos in my 500px portfolio. If you like my photography, feel free to support my addiction habit by purchasing a copy for your wall and/or visiting Amazon (affiliate link) to purchase new or used photographic gear.

 

Pika Rocky Mountain National Park – Buy a copy for your wall

Pika - Rocky Mountain National Park

A closeup of a Pika that I found in Rocky Mountain National Park. Captured with Sony a9 with Sony 100-400 GM + 1.4x extender

Do you need machine learning? Maybe. Maybe Not.

Do you need machine learning? Maybe. Maybe Not.

I’ve recently written about the risks of machine learning (ML), but with this post I wanted to take a step back and talk about ML and general. I want to talk about the ‘why’ of machine learning and whether you and/or your company should be investigating machine learning.  Do you need machine learning?  Maybe. Maybe not.

The first question you have to ask yourself (and then answer) is this:  Why do you want to be involved with machine learning? What problem(s) are you really trying to solve?  Are you trying to forecast revenue for next quarter? You can probably do just fine with standard time series modeling techniques.  Are you trying to predict house prices in cities/neighborhoods around the world? Machine learning is probably a good idea.

I use this rule of thumb when talking to clients about machine learning:

  • If you are trying to forecast something with a small number of values / features – start with standard forecasting / modeling techniques.  You can always move on to machine learning after working through the standard approaches.
  • If you need to combine multiple data sets to create new knowledge and actionable insights, you probably don’t need machine learning.
  • If you have a complex model / algorithm with many features, then machine learning is something to consider.

The key here is ‘complex’.

Sure, machine learning can be applied to simple problems but there’s plenty of other approaches that might be just as good. Take the forecasting revenue example – there are multitudes of time series forecasting techniques you can use to create these forecasts.  Even if you have hundreds of product lines, you are most likely using a few ‘features’ to forecast one outcome which can easily be handled by Holt-Winters, ARIMA and other time-series forecasting techniques. You could throw this same problem at a ML algorithm / method and possibly get slightly better (or worse) results but the amount of time and effort to implement an ML approach may be wasted.

Where you get the most value from machine learning is when you have a problem that really vexes you. The problem is so complex that you just don’t know where to start. THAT is when you reach for machine learning.

Do you really need machine learning?

There are a LOT of people that will immediately tell you ‘yes!’ when asked if you should be investigating ML.  They are also the people that are trying to sell you ML / AI services and/or platforms. They are the people that have jumped on the band wagon and are chasing the latest buzzwords in the marketplace.  In 2 years, those same people will be jumping up and down telling you need to implement whatever is at the top of the buzzword queue at the time.  They are the same people that were telling you that you needed to implement a data warehouse and business intelligence platforms in the past.  Don’t get me wrong – data warehouses and business intelligence have their places but they weren’t right for every organization and/or every problem.

Do you need machine learning? Maybe.

Do you have complex stream of data that you need to process and turn into knowledge and actionable intelligence?  Definitely look into machine learning.

Do you need machine learning? Maybe not.

If you want to ‘do’ machine learning because everyone else is, feel free to investigate it and start building up your skills but don’t throw an enormous budget at it until you know beyond a shadow of a doubt that you need machine learning.

Or you could call me. I can help you figure out if you really need machine learning.

Photo by marc liu on Unsplash

Foto Friday – Rocky Mountain National Park

Late last year I had the opportunity to spend a week in Rocky Mountain National Park (RMNP).  Strangely, I’d never actually been to RMNP although I’ve been just about everywhere else around RMNP.

The trip was part of a trip to nearby Denver for a conference so I didn’t get as much time to spend in the park and surrounding areas as I’d wanted to, but I did spend every morning in the park for sunrise – and loved every second that I had there. I’d had a few pre-planned locations found during trip research and got a couple of really good sunrise shots but didn’t get as many opportunity for Elk that I wanted.  That said, I did get surprising access to multiple Moose during the trip as well as a few Pika.

Before we get into the trip photos, let me share the gear I used on the trip. If you want to know more about the gear, let me know and I can share my thoughts.

Now, onto the photos. If you would like to purchase a copy (or copies) of any of these photos, check out my portfolio site.

Sunrise and Fall Colors

Sunrise over Sprague Lake in Rocky Mountain National Park.

Red Sunrise

Sometimes, a quick ‘snap’ of the camera turns into something special. While I was walking around the lake after sunrise, I grabbed this quick snap, which turned out much better than expected.

Black & White Lake

Sprague Lake in Rocky Mountain National Park with a black and white treatment

Rocky Mountain Pika

While in Rocky Mountain National Park, I knew I wanted to find some Pikas. I was lucky and found a perfect habitat for them without much hiking. This is the outcome of my first visit.

The colors of sunrise

While wondering around Rocky Mountain National Pakr (RMNP) I found this spot and thought it’d be a good place for a sunrise photo. There wasn’t a lot of clouds that morning but I did get some fog that rolled in while the sun was rising. The fog plus the few clouds with color add some interest to this photograph.

Moose in the Morning

While at Rocky Mountain National Park, I had the chance to photograph a few moose. While walking down the road toward where a lot of folks said some moose had been spotted, I noticed this Bull Moose standing in the trees perfectly lit by the sunlight.

Moon over the Rockies

Went out to Sprague Lake in RMNP to capture sunrise hoping that the clouds would stick around. While setting up, I took a couple shots while the moon was out….and turned out the moon shots were so much better than the sunrise photos (the clouds disappeared before the sun came up).

See more of my photography here.

The Data Mining Trap

a photo of a lobster trap by the see

In a post titled Data Mining – A Cautionary Tale, I share the idea that data mining can be dangerous by sharing the story of Cornell’s Brian Wansink, who has had multiple papers retracted due to various data mining methods that aren’t quite ethical (or even correct).

Recently, Gary Smith over at Wired wrote an article called The Exaggerated Promise of so-called Unbiased Data Mining with another good example of the danger of data mining.

In the article, Gary writes of a time that noted physicist and Nobel Laureate Richard Feynman gave his class an exercise to determine the probability of seeing a specific license plate int he parking lot on the way into class (he gave them a specific example of a license plate).  The students worked on the problem and determine that the probability was less than 1 in 17 million that Feynman would see a specific license plate.

According to Smith, what Feynman didn’t tell the students was that he had seen the specific license plate that morning in the parking lot before coming to class, so the probability was actually 1. Smith calls this the ‘Feynman Trap.’

Whether this story is true – I don’t recall ever reading it from Feynman directly – (although he does have a quote about license plates), its a very good description one of the dangers of data mining — knowing what the answer will be before starting the work. In other words, bias.

Bias is everywhere in data science. Some say there are 8 types of bias (not sure I completely agree with 8 as the number, but its as good a place to start as anywhere else). The key is knowing that bias exists, how it exists and how to manage that bias. You have to manage your own bias as well as any bias that might be inherent in the data that you are analyzing. Bias is hard to overcome but knowing it exists makes it easier to manage.

The Data Mining Trap

The ‘Feynman Trap’ (i.e., bias) is a really good thing to keep in mind whenever you do any data analysis.  Thinking back to the story shared in Data Mining – A Cautionary Tale about Dr.Wansink, he was absolutely biased in just about everything he did in the research that was retracted. He had an answer that he wanted to find and then found the data to support that answer.

There’s the trap. Rather than going into data analysis with questions and looking for data to help you find answers, you go into it with answers and try to find patterns to support your answer.

Don’t fall into the data mining trap. Keep an open mind, manage your bias and look for the answers. Also, there’s nothing wrong with finding other questions (and answers) while data mining but keep that bias in check and you’ll be on the right path to avoiding the data mining trap.

Photo by James & Carol Lee on Unsplash

Foto Friday – On the way to Zion National Park

I call this one “On the way to Zion”.

Captured with Canon 5D and Canon 17-40 handheld on the way into Zion National Park.

See more photos in my 500px portfolio. If you like my photography, feel free to support my addiction habit by purchasing a copy for your wall and/or visiting Amazon (affiliate link) to purchase new or used photographic gear.

 

Purchase a copy for your wall.

An image of Zion national park

While in Zion a few years ago, I stopped by the side of the road to grab a quick snapshot. I didn’t do anything with this at the time but now looking back at it, I really like it. Lots of things to catch your eye.

This one skill will make you a data science rockstar

Image for data science rockstar

Want to be a data science rockstar? of course you do! Sorry for the clickbait headline, but I wanted to reach as many people as I can with this important piece of information.

Want to know what the ‘one skill’ is?

It isn’t python or R or Spark or some other new technology or platform.  It isn’t the latest machine learning methods or algorithms. It isn’t being able to write AI algorithms from scratch or analyze terabytes of data in minutes.

While those are important – very important – they aren’t THE skill. In fact, it isn’t a technical skill at all.

The one skill that will make you a data science rockstar is a so-called ‘soft-skill’.  The ability to communicate is what will set you apart from your peers and make you stand out in an increasingly full world of data scientists.

Why do I need to communicate to be a data science rockstar?

You can be the smartest person in the world when it comes to creating some wild machine learning systems to build recommendation engines, but if you can’t communicate the ‘strategy’ behind the system, you’re going to have a hard time.

If you’re able to find some phenomenal patters in data that have the potential to deliver a multiple X increase in revenue but can’t communicate the ‘strategy’ behind your approach, your potential is going to be unrealized.

What do I mean by ‘strategy’?  In addition to the standard information (error rates/metrics, etc) you need to be able to hit the key ‘W’ points (‘what, why, when, where and who’) when you communicate your output/results. You need to be able to clearly define what you did, why you did it, when your approach works (and doesn’t work), where your data came from and who will be effected by what you’ve done.  If you can’t answer these questions succinctly and in a manner that a layperson can understand them, you’re failing a data scientist.

Two real world examples – one rockstar, one not-rockstar

I have two recent examples for you to help highlight the difference between a data science rockstar (i.e., someone that communicates well) and one not-so-much rockstar. I’ll give you the background on both and let you make up your own mind on which person you’d hire as your next data scientist. Both of these people work at the same organization.

Person 1:

She’s been a data scientist for 4 years. She’s got a wide swatch of experience in data exploration, feature engineering, machine learning and data management.  She’s had multiple projects over her career that required a deep dive into large datasets and she’s had to use different systems, platforms and languages during her analysis. For each project she works on, she keeps a running notebook with commentary, ideas, changes and reasons for doing what she’s doing – she’s a scientist after all.   When she provides updates to team members and management, she provides multiple layers of details that can be read or skipped depending on the level of interest by the reader.  She providers a thorough writeup of all her work with detailed notes about why things are being done they way they are done and how potential changes might effect the outcome of her work.  For project ‘wrap-up’ documentation, she delivers an executive summary with many visualizations that succinctly describes the project, the work she did, why she did what she did, what she thinks could be done to improve things and how the project could be improved upon. In addition to the executive summary, she provides a thorough write-up that describes the entire process with multiple appendices and explanatory statements for those people that want to dive deeply into the project. When people are selecting people to work on their projects, her name is the first to come out of their mouths when they start talking about team members.

Person 2:

He’s been a data scientist for 4 years (about 1 month longer than Person 1).  His background is very technical and is the ‘go-to’ person for algorithms and programming languages within the team. He’s well thought of and can do just about anything that is thrown over the wall at him. He’s quite successful and is sought after for advice from people all over the company.  When he works on projects he sort of ‘wings it’ (his words) and keeps few notes about what he’s done and why he’s chosen the things he has chosen.  For example, if you ask him why he chose Random Forests instead of Support Vector Machines on a project, he’ll tell you ‘because it worked better’ but he can’t explain what ‘better’ means.   Now, there’s not many people that would argue against his choices on projects and his work is rarely questions. He’s good at what he does and nobody at the company questions his technical skills, but they always question ‘what is he doing?’ and ‘what did he do?’ during/after projects.  For documentation and presentation of results, he puts together the basic report that is expected with the appropriate information but people always have questions and are always ‘bothering him’ (again…his words). When new projects are being considered, he’s usually last in line for inclusion because there’s ‘just something about working with him’ (actual words from his co-workers).

Who would you choose?

I’m assuming you know which of the two is the data science rockstar. While Person 2 is technically more advanced than Person 1, his communication skills are a bit behind Person 1. Person 1 is the one that everyone goes to for delivering the ‘best’ outcomes from data science in the company they work at.  Communication is the difference. Person 1 is not only able to do the technical work but also share the outcomes in a way that the organization can easily understand.

If you want to be a data science rockstar, you need to learn to communicate. It can be that ‘one skill’ that could help move you into the realm of ‘top data scientists’ and away from the average data scientists who are focusing all of their personal developer efforts on learning another algorithm or another language.

By the way, I’ve written about this before here and here so jump over and read a few more thoughts on the topic if you have time.

Photo by Ben Sweet on Unsplash