<![CDATA[Prahlad G Menon, Ph.D - Assistant Professor The MeDCaVE lab - The Menon Blog ]]>Wed, 29 Nov 2017 06:17:39 -0800Weebly<![CDATA[Guiding the Investor's Pick for Regional Investment in the Greater Pittsburgh Area]]>Sun, 24 Sep 2017 17:54:51 GMThttp://justcallharry.com/the-menon-blog/guiding-the-investors-pick-for-regional-investment-in-the-greater-pittsburgh-areaPicture
 This afternoon I had a chance to access a dataset curated from Zillow data from which it was possible to compute average home prices for several regions in the Greater Pittsburgh Area. Given that this data was available from 1996 onward to 2017, naturally I was curious to see whether the average home prices have in-fact increased across all regions in the area, considering years leading up to the real-estate crash of 2008. To remind those unfamiliar with this event, on December 30, 2008, the Case-Shiller home price index reported its largest price drop in its history. The unique dataset at my disposal this afternoon formed the ideal platform to truly examine the effects of the crash on Pittsburgh, PA.  The plot on the Left compares the year-on-year rate of change of average home price (i.e. the "Slope" of average home price, measured in "$ per year" - a term we will use throughout the remainder of this blog post) in all regions put together, leading up to Dec 30, 2008 and after the crash (i.e. considering trends from 2009 to 2017).I'll have to say I was a little disappointed to see such a marginal increase in the Slope of year-on-year average home value - while there was an improvement, it was hardly statistically significant when the entire cohort is considered.

But this can't be right, per common knowledge, can it?  I thought the market was doing better of-late, given the skyrocketing home prices these days. A colleague and mentor of mine once told me that a limited dataset will confess to anything if interrogated sufficiently.  So, I decided to tear the data apart a little bit more to make it confess to what I wanted it to tell me.  Onward to Exhibit #2 (see below; click to zoom in):

From the plot above, which describes the change in Slope of average home value by region from before Dec 30, 2008 to after this point in time (viz. the X-axis), it seems quite clear that there were several regions in the Greater Pittsburgh Area which did in-fact start improving in region-average home price after the crash of 2008 - but some better than others. Contrary to common thought, there were a host of regions which actually started doing "worse" after 2009 than they were doing before the crash.  What in the world did these regions have to do in order to accomplish this?  Lets find out! In the plot below, I examine how the year-on-year rate of change in home value (i.e. the Slope) after the crash relates to the change in Slope from before the crash to after the crash, such that each circle in the plot below represents a region, the size of which is equivalent to the relative average home value in that region.  Further, to add information to this plot, the plotted circles are heat-mapped from light-blue to black in increasing order of Slope of year-on-year home value "before" the crash.  So, what does this tell us?
Well, it appears as though the regions which were actually doing well up until the crash of 2008 started doing really poorly with a year-on-year decrease in average home value after the crash!  Au contraire, not all the regions which were doing badly before the crash started doing particularly well (in terms of slope i.e. year-on-year increase in average home price) after 2009. What is clear however is that the largest circles in the plot i.e. the regions which have the highest region-average home value today also correspond with those regions which were mid-range in terms of Slope of home value over time before the crash (i.e. light blue) but also experienced the largest change in Slope from before to after the crash of 2008 (i.e. they're on the far right of the plot above - including, Peters Township and Upper St Clair in the South of Pittsburgh, and Franklin Park and Wexford in the North)!  Oh, how the times change!

In summary, it appears that the regions which did poorly (i.e. either getting lower in value from year-to-year or pretty much changing by very small dollar amounts per year, before the Crash) began doing a whole lot better after 2009, with significantly increasing home values every year (see summary plot below), whereas all the so-called "Healthy" areas before the crash began losing in value every year (i.e. negative Slopes!) after 2009 - mind boggling, right?  Ah, but one group of elite regions before the crash did hold its own and surpassed other regions in terms of growth in value after 2009 - which one was this? Alas, the rich tend to always grow richer and the best performers since 2009 were also the best performers in terms of year-on-year home value before Dec 30, 2008.  So, it pays to go big when you're buying your "forever home"!

<![CDATA[Comparing top school districts in Pittsburgh - the buyer's perspective to home-buying and getting bang for your buck]]>Sat, 09 Sep 2017 18:07:23 GMThttp://justcallharry.com/the-menon-blog/comparing-top-school-districts-in-pittsburgh-the-buyers-perspective-to-home-buying-and-getting-bang-for-your-buckPicture
As a prospective home-buyer targeting a strong school district, I look for schools a favorable student teacher ratio (<= 20:1) but also bang for my buck in terms of getting as much house per dollar, as possible!  However, the latter isn't a trivial landscape to assess as more often than not the affordability of a home is hinged on a reasonable home value for the purpose of tax assessment in addition to the alluring bells and whistles that set apart one similarly priced home in an optimal school district, from another. In this post, I analyze list price, tax assessed value and home quality in terms of interior components, as a function of school district.  

My focus in this study is homes of value between $250k and $350k, in the Riverview School District (Oakmont, PA) and the West Jeffrson Hills School District (Jefferson Hills, PA & Pleasant Hills, PA), while focusing on statistical evidence of paying optimal value per square foot of home, without compromising on schooling quality accessible in the region.  In order to do this, I wrote a PHP-based web-scraper hosted in a local WAMP environment to extricate data from the West Penn MultiList, including school-district and several fields which aren't typically accessible via any publicly available API - including the Zillow API!  For those reading this article whom are interested in my code for this, please write to me in the comments, below :)

First, how does list price compare with tax assessed value in Oakmont as opposed to Jeffrson Hills?  Well, the plot below is clear in terms of the fact that homes in Jeffrson Hills are tax assessed in terms of a value very close to the list price of the home, whereas in Oakmont the trend indicates a substantially lower home value assessment for tax purposes in comparison to list prices.  Does this mean Oakmont homes are overpriced?  
​Maybe Oakmont homes are overpriced - but relative to what?  To truly know this answer, we need to learn more about what square footage we get for the same dollars in the two regions!

But first, lets think about what we're paying for based on a list price...Lets remind ourselves that we're talking about a limited cohort of homes (all active listings in the two school districts in the price range of $250k to $350k, as of the first week of Sept 2017) but the data is interesting, nonethelessWhen we walk into a home as a prospective home-buyer, naive to potential skeletons revealed by a thorough home inspection report, we look for granite counter tops and hardwood flooring - bells and whistles which are quite orthogonal to the primary purpose of a place of shelter (e.g.: the roof or the likelihood of the place being in a flood zone!).  Nevertheless, lets take flooring, for instance - does quality flooring increase the list price of a home?  

​Absolutely, optics and first impressions are everything in the art of selling a good (or a maybe not so good but well-staged) home! But note that list prices are notional dollars and not the same as the selling price of a home which are on average between $10k-$15k lower than asking, at least.  For more texture on the relationship between list price and sold price in the Greater Pittsburgh Area, maybe check out my previous blog post analyzing about 7.8k sold homes over the past 12 months.  But going back to floors, statistics tells us that while hard-wood flooring from wall-to-wall does have its effect on fetching a higher tax assessment value in any school-district, wall-to-wall tiling or vinyl flooring isn't uncommon in homes taxed at higher valuations!​
Moving on to square footage!  While the assessed home values are higher in Jefferson Hills and Pleasant Hills (both in the West Jeffrson School District), Jefferson Hills offers the greatest home square-footage per dollar of tax owed, by far, relative to the overpriced and relatively smaller homes in Oakmont. What Oakmont's got going for it though is that you'll probably end up paying a bit lower of a mortgage payment in Easy Allegheny owing to the lower tax assessments of its homes, relative to Jeffrson hills (South Allegheny).
And don't forget those tiny yards in Oakmont relative to the abundance of yard space seen in homes in Jeffrson Hills!  So, in conclusion, I'd say that Jeffrson Hills would be the place to get the most bang for your buck, comparing with Oakmont's Riverview School District at least.  This does seem congruent with common knowledge though - after all, lets remind ourselves that Jeffrson Hills was rated the top place to raise a family in the Commonwealth of Pennsylvania, at one point!

<![CDATA[What Drives Real-Estate Prices in Pittsburgh?]]>Sun, 20 Aug 2017 14:05:51 GMThttp://justcallharry.com/the-menon-blog/what-drives-real-estate-prices-in-pittsburghPicture
This morning I had a chance to look into a dataset of 7,584 homes sold between Aug 2016 and Aug 2017, in the Greater Pittsburgh Area, from the West Penn Multi-List.  This dataset included 9 of the most desirable counties in the area and encompassed 76 unique areas (each corresponding to at least one unique school district).  The range of prices of sold homes in the area had a mean of $249,204.73 +/- 2127.51 (std dev: $185,276) with sold prices going all the way up to $2,165,000.00!  

Contrary to common belief I found that while Sold Price was a function of List Price (see Figure below) with a 70% predictive strength, the number of Full Baths were a better descriptor of the variability in Sold Price than number of Bedrooms and County put together!  

So, how does List Price compare with Sold Price, for homes with an increasing number of bedrooms?  Clearly more expensive homes have a greater likelihood of having more bedrooms, but a really pricey home may still not be the mansion you would expect it to be, while even less expensive homes can have up to 7 bedrooms!
Now lets take a look at how List Price compares with Sold Price, for homes with an increasing number of full bathrooms.  The plot below makes it clear that the more pricey homes (especially the highest rung in the ladder viz. homes which sold for above $344k) had a greater likelihood of having more full baths!  
To conclude, the data also had something to reveal from the standpoint of the typical Seller Agent's mindset!  The statistics make it clear that this group of individuals love a good haggle! The mean List Price for a sold home at $258,289.08 +/- 2282.84 was ~3.6% higher in comparison with the statistically significantly lower ($249,204.73 +/- 2127.51​, p=0.0036) Sold Price for these homes!  So, if you're on the market for a home in Pittsburgh, you'll want to be sure to get your Buyer Agent is capable of haggling your price down at least $9k if you plan on signing a Buyer's Agent Contract.  
<![CDATA[How to set up a dashboard for an IoT device managed via IBM BlueMix (NodeRED + FreeBoard.IO) ]]>Sun, 08 May 2016 21:34:32 GMThttp://justcallharry.com/the-menon-blog/how-to-set-up-a-dashboard-for-an-iot-device-managed-via-ibm-bluemix-nodered-freeboardioPicture
I just posted a little YouTube video on how to set up a simple dashboard for an internet of things (IoT) device managed via IBM BlueMix, using a cloud-deployed NodeRED application, coupled with FreeBoard.IO's open-source tools for Dashboarding the streamed JSON data from the IoT device. The intent of this video is to empower IoT enthusiasts with some basic steps on how to quickly set up a meaningful IoT device using the IBM BlueMix cloud.

In this brief tutorial, I first illustrate the functionality of a simple dashboard to visualize live temperature data as well as threshold-based alerts from an IoT enabled sensor. After the brief demo, I attempt to cover how one might go about setting up a similar basic dashboard on freeboard.io as well as an HTTP REST API, put together on NodeRED via the IBM BlueMix cloud.  

A simulated temperature sensor is used in this tutorial but the workflow remains the same (including the concept of the Device ID, illustrated in this video for the simulated device) if one chooses to use Texas Instruments' SensorTag device for an initial road-test of the BlueMix cloud using a real device!

Want to do even more with your dashboard..? How about some machine learning! The sequel to the above video which illustrates how R based machine learning results are integrated with a Freeboard.IO dashboard, is available at : https://www.youtube.com/watch?v=5lMiESufhwI

<![CDATA[On Predictability and IntraDay Trading: The Value of a Candlestick]]>Mon, 08 Feb 2016 13:16:27 GMThttp://justcallharry.com/the-menon-blog/on-predictability-and-intraday-trading-the-value-of-a-candlestickPictureFigure 1. Intraday forecastability analysis.
Typically algorithmic traders attempt to leverage historical stock prices, price movements and functions of price or volume of trades, including Twitter or Stocktwits based sentiment to predict the direction of future stock prices. Such models are often supplemented by a money management strategy which is implemented in the form of a trade execution engine that uses the historical success of predictions made by a trained model or identified pattern to determine the amount of capital to invest (i.e. in the long or short direction) on future predictions. However, all inputs to an algorithmic trading model - machine learning based or simple pattern recognition - are not made equal.  In this blog post, I example the forecast-ability of intraday and end-of-day candlesticks by examining open, high, low, close and volume data, independently as well as in combination.

The approach for this analysis was inspired by Goerg 2012 [1] which presents an adaptation of principal component analysis i.e. a novel dimension reduction technique for temporally dependent signals, utilizing a new forecastability measure, Omega. Omega is an uncertainty metric defined based on the Shannon entropy [2] of the Fourier transform of the autocovariance function of a given univariate time series (i.e. open, high, low, close or volume, in this study). In this manner, Omega therefore forms a quantitative means to separate a multivariate time series into a forecastable (Omega >> 0) and an orthogonal white noise space (i.e. Omega ~ 0).  My analysis of SPY (S&P500 ETF) are presented below.

A look into 5, 10 and 60 minute intraday candlesticks of SPY between 1 Jan 2016 and 5 Feb 2016 led to a rather surprising revelation that intraday "volumes" are a better predictor of its future value than any of the other tested univariate time series' viz. open, high, low and close series'. Surprisingly, close prices - the often recommended gold-standard price that is supposedly least affected by end-effects, instabilities and such was found to be the "least" predictable series!

PictureFigure 2. EOD candlestick forecastability.
Au contraire, and slightly disappointingly so, volume wasn't as forecastable and index using end-of-day (EOD) candlestick data (see Figure 2), while the relative ly poor predictability of close and adjusted-close prices didn't cease to disappoint!

As expected, confidence in the reported forecastability, as evidenced by the p-value for the reported series-specific Omega, reduced as the lag for time series forecasting incremented further and further into the future (owing to perhaps a lesser amount of data being available at 10x5 minute intervals than 1x5 minute intervals, for instance). That said, all reported Omega data in Figure 1 (i.e. 5, 10 and 60 minute intraday candlestick analysis) and Figure 2 (EOD candlestick analysis) were statistically significant. 

Some lessons learned here, perhaps: a) Never underestimate the value of intraday candlesticks; and b) If you're an algorithmic traders attempting to leverage historical stock prices alone to predict prices, think again! Intraday volumes may serve your algorithm some pleasant surprises and improved predictive performance.  

Also of interest might be that an analysis of FCX (Freeport McRoran) and WTI (i.e. a Crude Oil metric - W&T Offshore Inc.) revealed similar results except that WTI adjusted-close prices were classified (based on the Omega / entropy analysis) as "white noise"!  

In principle, it is possible to leverage univariate Omega values as a maximizable objective function to design an optimal function of a time series (or a linear combination of time series') which are more forecastable than any independent univariate time series. Although this so-called optimal time series is likely to be highly stock and tick-interval / candlestick frequency specific, a truly forecastable truth is out there for every ticker! 

Oh, and in case you were wondering what the "blue dotted / dashed lines" were on the plots in Figures 1 and 2 - they represent the heightened level of predictability (i.e. Omega) of a multi-variate index determined as a linear combination of open, high, low, close and volume and open, high, low and close, respectively. In my experiments, so far, a 60% to 70% improvement in predictibility is possible to achieve using a combination of the univariate variable which constitute a standard candlestick time-series dataset.

[1] Goerg GM. Forecastable Component Analysis (ForeCA). arXiv preprint arXiv:1205.4591. 2012 May 21.
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–23, 623–656.

<![CDATA[Mending a Broken Machine: On Predictive Maintenance and Machine Learning]]>Wed, 03 Feb 2016 23:42:53 GMThttp://justcallharry.com/the-menon-blog/mending-a-broken-machine-on-predictive-maintenance-and-machine-learningPicture
Predictive maintenance - a burgeoning science - is the art of machine learning to help determine the condition of in-service equipment, usually in real-time, and therefore predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance and additionally offers the flexibility to algorithmically generate alerts which may avert catastrophic events.  

I have been involved with implementing some predictive maintenance and condition based monitoring algorithms recently, primarily with application to the transportation industry, and decided to do a little literature search on the subject: 

A above figures provide a summary of the literature in predictive maintenance space (generated based on Web Of Science search results ). The science of predictive maintenance has grown (in terms of cited and new literature) exponentially since 2014, with a remarkable performance from the transportation sector. The power of data has truly spread the wings of the transportation sector - planes, trains and automobiles - thanks to the ubiquitousness of machine learning, the virtually limitless extents of cloud storage and of course elastic cloud computing!

<![CDATA[Random Musings: A poem fished out of my old iPhone notes!]]>Wed, 02 Dec 2015 12:31:49 GMThttp://justcallharry.com/the-menon-blog/random-musings-a-poem-fished-out-of-my-old-iphone-notesPicture

It is uncanny when one finds interesting iPhone Notes from years ago. This morning I was on a random walk back through time when I stumbled into a little poem that I recall writing one morning during a visit to a Rainforest, in Puerto Rico, when there was no rain. Childish but interesting...

Rainforest , O' Rainforest
It is dry and cold outside
So, relent and harbor your time 
With the winds of tranquility ...

Reminisce the days 
Your trees danced in the rain
And raindrops broke their fall 
From stormy skies

On to your leaves,
In a million little bits
They splashed to the ground 
Nourishing flowing brooks

But it is dry and cold outside
So, relent and harbor your time 
With the winds of tranquility 
Rainforest , O' Rainforest !

<![CDATA[MarketsMD Blog: Stock Market Sentiment, Machine Learning and Daily Price Movement Forecasting!]]>Mon, 02 Nov 2015 01:26:15 GMThttp://justcallharry.com/the-menon-blog/marketsmd-blog-stock-market-sentiment-maching-learning-and-price-forecastingI have grown to become quite passionate about modeling daily stock market price movements on a select universe of stocks and analysis of market sentiment using a suite of in-house algorithmic approaches. Therefore, I've begun a second blog focused on reporting some of my daily stock market price movement predictions as well as occasional reports on market sentiment! Check out the MarketsMD Blog at www.long-short.com or www.marketsmd.co.nr .

The following is a link to my latest post analyzing stock market sentiment for November 2015 using Twitter feeds as a data source: http://quantmd.weebly.com/marketsmd/market-sentiment-for-november-15 ]]>
<![CDATA[Taking UberCloud for a spin- A biofluid dynamics test run!]]>Mon, 09 Feb 2015 03:18:14 GMThttp://justcallharry.com/the-menon-blog/taking-ubercloud-for-a-spin-a-biofluid-dynamics-test-run
<![CDATA[Informing patient behavior - the key to a healthy future with a reasonable price tag.]]>Sun, 05 Oct 2014 13:45:43 GMThttp://justcallharry.com/the-menon-blog/informing-patient-behavior-the-key-to-a-healthy-future-with-a-reasonable-price-tagPicture
Patient non-compliance to treatment regimens has always been a significant challenge in chronic disease care.  To quote an excerpt from a rather interesting MedScape CME education article, “Any drug that you do not take does not work” [1]. Patients' day-to-day decisions have a tremendous impact on the reported efficacy of treatment regimens and more importantly on their own health. Therefore, in the information age the favorable advocacy is for patients to be converted into active, informed participants taking ‘charge’ of their individual care management processes in order to guarantee the overarching success of the healthcare delivery system as a whole. Physicians must help patients take charge of their conditions by encouraging them to set self-management goals but also allow them to become actively involved with the diagnosis and selection of treatment options for their ailments.

                The old models of care where physicians would tell patients what to do and try to motivate them to change their habits or ways does not have a place in today’s day and age for a plethora of reasons which eventually boil down to fact that the average internet-enabled individual today does not like being told what to do without forming his / her own opinions.  Working out a fool-proof way of getting through to the modern patient is one with substantial medical and economic ramifications.  To shed some light on the substantial cost implications of effectively taking control of patient behavior, it is interesting to note that a majority of healthcare spending (outside of public-health expenditure) is affected by consumer choices.  In fact, 69% of healthcare spending – including spending for catastrophic events attributed to chronic conditions, discretionary care and end-of-life care – is largely subject to individuals’ choices and behavior [2].

                Assimilating the facts, therefore, informing patient behavior clearly has ramifications on both the effectiveness of patient care as well as healthcare economics. Now let us go over some interesting approaches which are gathering momentum in regard to informing patients about the state of their health and therefore indirectly allowing the healthcare system to begin gaining control of ‘patient behavior’.  One way to make patients active participants in healthcare administration is to allow them to actively partake in the diagnosis of their health and further actually ‘operate’ diagnostic devices without scheduling appointments. This very concept is being pursued by a startup company from Ohio – HealthSpot [3], as well as Mayo Clinic [4] in the form of ‘diagnostic kiosks’ and is expected to make a revolutionary impact in regard to patient empowerment and reducing healthcare costs. Such technology has potential for impacting diagnosis and administration of treatments for minor and common health conditions such as the cold, earaches, sore throat, sinus infections, upper respiratory infections, rashes and skin conditions and eye conditions but tend to be less suited to management of ailments of less able-bodied patients with chronic conditions.

                Beyond the kiosk concept, several medical device startups – some with FDA 510k cleared product offerings – [5] as well as major technology firms such as Apple and Samsung are investing heavily in technology that can monitor a user's health using their mobile devices.  As per a recent BBC News reported survey [6], more and more patients today are going to their general practitioner with preconceived notions regarding their expected treatment pathways which they glean based on information from apps and the internet. However, the BBC article also reports that often a patient's online diagnosis is not useful to the physicians. 

                From the standpoint of the Devil’s Advocate to the above concepts, the hypothetical scenario of generating a population of non-medical trained but highly opinionated patient could be quite worrisome… In such a scenario, what might begin as a social issue of inappropriate self-administration of inordinate doses of over-the-counter medication may in the long term create a faction of quasi-trained healthcare professionals – or worse still, medical software / apps – with a greater power over patient beliefs than physicians themselves!  So, how far must be actually go with regard the liberties offered to the active and informed patients in the care management process..?

[1] “Adherence: The Silent CV Risk Factor” on Medscape. http://www.medscape.org/viewarticle/582903
[2] McKinsey International Survey on Healthcare’s Digital Future: http://www.mckinsey.com/insights/health_systems_and_services/healthcares_digital_future
[3] HealthSpot: http://www.healthspot.net/
[4] Mayo Clinic Telehealth Kiosks: http://medcitynews.com/2014/10/mayo-clinic-telehealth-kiosks/
[5] http://medcitynews.com/2013/01/iphone-app-for-retinal-images-cleared-by-fda-could-expand-telemedicine-eye-exams-video/
[6] http://www.bbc.co.uk/news/technology-29458143