I’ve spent the last two weeks with my head buried in programming languages.
I’ve been needing to re-write some scripts for data analysis for my research. I initially wrote some scripts in R but found that R is particularly slow when it comes to this type of analysis (more accurately I should say that my implementation of these analysis techniques is slow).
So..I started looking for a more economical way to do this analysis. I’m using PHP to do some of the up-front data collection so my logical choice was to dust off my PHP skills and build some analysis scripts using PHP.
So I got out my PHP books and started coding. After a few days, I had a pretty impressive set of scripts that would take my collected data, run a bayes classification filter on that data for sentiment and then summarize that data. I was proud of myself…until I realized that the implementation of my classification algorithm would be difficult to justify in an academic setting….or at least that I’d have to spend a lot of time defending and justifying it at a later date. This was also one of the reasons that I wanted to re-write the R scripts.
So…I revisited my approach. Was there anything written in PHP that was well received in the academic world? Of course not.
One approach that is used by many researchers in text classification and sentiment analysis is to use the Python language and the Natural Language Toolkit (NLTK) — and there are plenty of academic articles citing the NLTK…so that helps me with defending my algorithms in my dissertation work.
Now…I’ve never looked at Python. I couldn’t have written a “Hello World” program in python. But…it needed to be done, so I found some resources on the web and dove in. Over the course of a few hours I wrote my analysis and summary scripts in python….and was absolutely amazed at how quick this language is. My buddy Jeff is probably getting tired of me telling him how great python is … but oh well…he’ll keep hearing it 🙂
I was able to get the time that my analysis takes down from 8 to 9 hours in R to about 1.5 hours in python. Talk about a time saver! Now…most of that time savings is probably due to new approaches to the analysis rather than just a pure python vs R speed issue….but the re-writing forced me to rethink my approach.
Why tell you about my newfound skillz (I’m told you have to use ‘z’ in this usage of the word)?
Part of me wanted to brag a bit 🙂
But, more importantly, learning a new programming language isn’t necessarily about the language itself…its about the discovery process. For me, learning Python forced me to rethink my approaches to the data analysis I was working on…and the outcome is a faster analysis with potentially more accurate results as well as a more defensible algorithm. Learning a new language forced me to think through my approach. It forced me to think about the inputs and outputs.
When is the last time to you took a step back and rethought your approach? You don’t need to learn Python to do it…just take a step back from your day-to-day grind and really look at what you are doing. Is it working for you? Is it working for your team and/or organization?
If the answer isn’t an unequivocal ‘yes’, then maybe you need to rethink your script(s) and look for a new approach.
12 responses to “I learned Python…and much more”
Great post and thanks for the link 🙂
Thanks and you are welcome 🙂
Good post. I assume you are using the methodology of positive and negative sentiment scoring? How is it that something as untestable as that makes it through academic circles but something like implementing a filter which can be validated with a few unit tests requires long defense? This is not a mark against you. I’ve for years now questioned the whole polarity of sentiment approach. Emotion is so much more broad than a simple +/- range.
http://en.wikipedia.org/wiki/List_of_emotions
@Nectarineimp Thanks for the comment.
I am using that approach, based on years of academic research in the space. of course, it isn’t perfect (what is?) but the research shows it is a viable approach to summarizing sentiment of written text. Again…its not perfect but it is interesting.
Published: I learned Python…and much more http://t.co/yyluINTK
I learned Python…and much more http://t.co/QiB6YKTl via @EricDBrown
Good one. RT @EricDBrown: Published: I learned Python…and much more http://t.co/FlQR352x
I learned Python…and much more http://t.co/T8UGUH6j
I learned Python…and much more http://t.co/Gk3dE9YU from @ericdbrown
I’m curious how long it took you to write the code in Python, as opposed to PHP. I realize that in the former case you were also learning a new language, but still its of interest.
@gamesbook I wrote my PHP code in over the course of a few days. The corresponding python code was written over about 4 hours. The difference in the length of time was mostly due to the level of effort required to understand and write the bayes algorithm in PHP whereas python’s Natural Language Toolkit (NLTK) was ready made and just required a few calls to the toolkit.
@ericbrown@gamesbook People often use the term “Python: batteries included”. That is true, but its also the car-charger, solar cells, and Portable Nuclear Reactor which just as readily available that make Python *really* useful.