Thursday, October 20, 2016

White Papers vs the Real World of Machine Learning, Big Data, Blockchain, and artificial intelligence



     Artificial Intelligence, or really machine learning, is all the hype now.  Everybody is doing it. Everybody has it. Everybody needs it.  Everybody wants it. It'll change the world. It'll disrupt. It'll make you rich. It'll save your hide.  It's better than the discovery of fire and sandwiches.

     Throw in some smart bots, big data, Spark, cloud, and cognitive computing and you hit the grand slam of them all.  Throw up a couple of cool white papers around Neural Networks and Linear Regression,  an article or two around tensorflow or Theano or torch or caffe and you're good to go, even though tensorflow has become the defacto 'everyone is using it these days' tool.  Or so you would think they are until you start talking to people who are really using it.

     Silicon Valley and a few other areas are indeed 'all in' when it comes to Neural Networks, AI, machine learning, big data, etc.  But many of the white papers you read or webinars and meetups that you attend are by people who work for companies that don't actually make money.  Or they work for  Google or facebook, which is almost like not working in the real world anymore.  Or it sounds far more like a resume builder than talking about something they did that made an impact.  I'm sure many of them do make huge impacts, but you rarely hear about it. I've attended some of these, listened and read  And many are new.  New as in the company existed for a year or less and are disrupting and changing the world. Except many are not much different than Theranos. Maybe not at that level, but at the level that all the cool things they do haven't made them any money. It just got them more VC.  Great if you can do it, pointless to the rest of the world when it comes to use cases or usage of those algorithms.

     Hadoop and even AWS has had that problem for years. Most of the corporate and government worlds want easy buttons.  Nobody wants to spend a year figuring out how to actually use the right tools in AWS.  Or what open source tool is the latest and greatest in the Hadoop ecosystem.  And how can they get their staff to actually learn it while still working a '9-5' job.  Many want to be innovative, but they also follow slow processes, procedures and policies. And while many of us, including myself, dislike some of these slow DMV like procedures and documentation issues,  the other reality is none of these places can really live in a world where a billion dollar Moonshot is just shrugging your shoulders.

       I mean imagine if anybody else came up with Google Glasses and failed so miserably?  IBM has been a disaster of 18 consecutive losing quarters.  But they still make billions.  Imagine if they came up with Google Glasses and just shrugged it off. People would be killing them.  People already are for Watson and how it's not making them money.   Google gets away with many things. There Pixel phone with the terrible commercials.  Microsoft was made fun of for their disastrous surface and windows phone commercials and product placements. Google gets a pass, but their commercials are even worse. And kind of full of themselves.

     But back to the original point, people who work for Google can  get away with implementing and playing around with all these cool AI and machine learning and robotics and everything else things.  Most other people in the corporate or 'real world'  can't spend six months on deeper learning algorithms and come back with,  hey Google + was a huge failure, Hangouts is being fazed out, our Image recognition was racist and often not right, our Nest thermometer was a recall nightmare,  the Motorola purchase was a bankrupt like failure,  but hey Moonshot, so what.

     That's a lot of so what's that nobody else can get away with. But it's also a lot of so what's that the rest of us can learn from, utilize and try and push forward some real successful results for the rest of the corporate and government worlds.  Instead we get a ton of white papers that go even further into moon shot theories.  I've read a ton of white papers, some I don't even understand half of it or any of it really.  But many of them are great in theory, great for some PhD, great for some moonshot Google project, but rather useless in the real world.

     I mean I've read papers that went into detail on how to implement everything in Matlab.    Really?  And you want people to do that in the Corporate World?   And it gets even more interesting when the entire white paper is great for comparing algorithms and models, but doesn't even answer if it worked for a business.   It pretty much was a love-fest on what algorithm to implement versus actually being useful to some business entity.

     We need far more useful use cases and less resume building look at how smart I am white papers.  There are some best practices and strategies to use, but forcing what works at some POC in college isn't the same as doing it for some big corporation who has been plugging away for 40 years. They might not exactly be doing things right, but then again, being many of the silicon valley startups don't really make any money either, the only thing they are doing better is conning people into giving them VC money and not marketing to people to actually use what they've built.  

Next time I will talk about Neural Networks and using AWS to implement some cool things that are actually useful for real businesses, not just the googles and Stanford's of the world.


Friday, October 14, 2016

elementryMind - Real Estate Forecasting whitepaper

In this whitepaper, we proposed an artificial intelligent real estate mind that can predict, forecast, & estimate values built on Artificial Neural Networks (ANN), Machine Learning and Natural Language Processing (NLP).
The evaluation for this AI real estate Mind was evaluated on the sales prices of real estate for San Diego properties.

The other goal was to utilize some blockchain like infrastructure to create a decentralized opensource real estate MLS system. But that is part of another whitepaper and not this one.

So going on about this whitepaper, the Buyers and Sellers(mainly real estate investors looking to buy, hold, pass) will make decisions based on these predictions about properties, value, opportunity, and neighborhoods.  We plan on expanding the analysis for better performance tuning and optimization for all of California, cities with or cities proposed to have Google Fiber, and Mexico. Plans to integrate speech software (Google Now, Siri, Cortana, IBM Watson, Amazon Echo) will be experimented and tested in the future.
ANN + Machine Learning + NLP are useful in modeling this capability and can be very useful in complex systems like real estate where motivations are determined by a combination of factors such as crime, school, neighborhood, jobs, cost, budget, and even emotion.  

Artificial Neural Networks (ANNs) like Multilayer perceptron with
un-bagging and bagging are useful for modeling input-output relationships that learn directly from observed data.  

Machine learning methods like SVM (support vector machine), LSSVM (least squares support vector machine), Linear Regression, M5 Model Trees, and Naïve-Neighborhoods are used to forecast real estate property values.  

NLP is used to analyze unstructured data and noise.



Real Estate Value Forecasting based on Artificial Neural Networks,
Machine Learning, &
Natural Language Processing
Peter Jamack, Darrick Sogabe, Darren Kempiners,
Shirity Priya, Douglass Brown, Olu Oyedipe.
San Diego  © 2016
Abstract
In this paper, we propose an artificial intelligent real estate mind that can predict, forecast, & estimate values built on Artificial Neural Networks (ANN), Machine Learning and Natural Language Processing (NLP).
The evaluation for this AI real estate Mind was evaluated on the sales prices of real estate for San Diego properties.

Buyers and Sellers will make decisions based on these predictions about properties, value, opportunity, and neighborhoods.  We plan on expanding the analysis for better performance tuning and optimization for all of California, cities with or cities proposed to have Google Fiber, and Mexico. Plans to integrate speech software (Google Now, Siri, Cortana, IBM Watson, Amazon Echo) will be experimented and tested in the future.
ANN + Machine Learning + NLP are useful in modeling this capability and can be very useful in complex systems like real estate where motivations are determined by a combination of factors such as crime, school, neighborhood, jobs, cost, budget, and even emotion.  
Artificial Neural Networks (ANNs) like Multilayer perceptron with
un-bagging and bagging are useful for modeling input-output relationships that learn directly from observed data.

Machine learning methods like SVM (
support vector machine), LSSVM (least squares support vector machine), Linear Regression, M5 Model Trees, and Naïve-Neighborhoods are used to forecast real estate property values.

NLP is used to analyze unstructured data and noise
Introduction
The main aim of this paper is to define a real estate property forecasting system based on ANN + Machine Learning + NLP. This system should be able to accurately predict real estate values, WACC, LTV and outperform every supervised learning algorithm and every real estate and brokers intuition and analysis. It will be able to do this by integrating ANN & machine learning along with scouring social media, the web, the dark web, open data, closed data, etc. for noise and non-noise information retrieval using Natural Language Processing (NLP).
The proposed system could also help in simulating interactions, development and proposals where location choices for housing, schools or companies strongly depend on the real estate market.
The main input parameters of the proposed system are real estate pricing, sales, comparable sales, crime, neighborhood, schools, unemployment, transportation, construction costs, cash flow, rental market, economic and environmental quality related attributes.  
The United States has three main real estate indexes.
The National Council of Real Estate Investment Fiduciaries Property Index (
NPI) for commercial real estate, and residential real estate has Radar Logic'sRPX and the S&P Case-Shiller indices.
Artificial neural networks (ANN) are constructed by the possibility of many neuron nodes and corresponding weights, in an artificial system, to simulate the neural network of humans, animals and plants.  It’s very good with nonlinear characteristics; therefore ANN can simulate the nonlinear functions. However, accuracy is low and performance may depend on powerful computing by GPUs.  For certain analysis, it is not very ideal. This is why combining Machine Learning Algorithms with ANN is a more ideal integrated solution.
The paper is organized as follows.
Section 1 is a literature overview, Sections 2 & 3 the real estate and algorithmic models are presented. In section 4-6, the proposed Artificial Neural Network (ANN) and Machine Learning and NLP models are defined using datasets from San Diego. This paper experiments with the impact of such key real estate attributes and neighborhood elements including sales price, historical trends, unemployment, school ratings, crime statistics, comparable neighborhood prices, and economic factors. In section 7, the results are discussed. In the last section (8), conclusions are carried out and validate the opportunity presented by ANN + Machine Learning + NLP for Real estate forecasting and estimating.  
  1. Literature
Artificial Neural Networks (ANNs) have the ability to learn, generalize results, respond to and predict adequately to incomplete or unknown data. (Shaw, 1992). ANN methodology was developed to capture functional forms, allowing the uncovering of hidden non-linear relationships between variables.  
ANN represents a sub-field of computer science concerned with the use of computers in tasks that are normally considered
Information knowledge and cognitive abilities (Gevarter, 1985).
It has been applied to the property price forecasting in recent years (Lai Pi-ying, 2011). Borst (1991) has defined a great number of variables in his network to appraise real estate in New York State, demonstrating that ANNs are able to predict the real estate price with 90% accuracy.
ANNs perform better than multi-variate analysis, since networks are nonlinear. They can also evaluate subjective information, such as the schools, neighborhood, crime, unemployment, transportation, fun, and the characteristics of the environment, which are difficult to incorporate into traditional mathematical approaches.
SVM Methods, which were founded on statistical learning theory, were developed in the 1990s to find a global optimized solution utilizing & solving quadratic programming problems. However with more data points, it leads to higher complexities.  It offers strong learning & generalization abilities and is used mainly for classification and regression problems.
LSSVM has improved the results of SVM by changing the inequality constraints in SVM. The original quadratic programming problem becomes a problem to solving system of linear equations. LSSVM reduces parameter adjustments, reduces the complexity of the SVM calculation and also improves the efficiency of calculation. However, LSSVM loses the sparse characteristics of SVM
Natural Language Processing (NLP) was developed in the 1950s, even earlier, and initially started with Alan Turing’s paper (“Computing Machinery & Intelligence”) and became known as the  ‘Turing Test.’ Most recently it’s focused around supervised and unsupervised learning methods.
  1. Real Estate Forecasting Model(s)
What is a reasonable price or return before even looking at properties?

There exists a formula like the
Weighted Average Cost of Capital (WACC).
The WACC takes into account leverage and risk to calculate the required equity return, r(e).
r(e) = [r(p) - (LTV) * r(D)] / (1-LTV)
                           LTV is Loan to value ratio of mortgage
r(D) is interest rate on the loan
      r(p) is the real property return (7-8 percent on avg)
So if you set LTV to 80% and interest rate at 5%, WACC equation calculates required equity return, r(e), at 20%
r(e) = [0.08 - (0.80) *0.05] / (1 - 0.80) = 0.2 or 20 percent

We also have to Remember Time Costs Money.

Present Value Formula (aka Present worth)

P = F/(1+i)n
  • P is the present value or worth of the object in question
  • F is a future payment or cost
  • i is the rate of return or discount
  • n is the number of time periods (years or months) considered
An example would be what would $100K be worth in 5 years?

PVF doesn’t need an interest rate, but the yearly average inflation (2%) as a discount rate.  The calculation is in years, n is 5.  
P = 100,000/(1+0.02)5]
P = $90,573
The following Expenses play an important factor in the return and therefore, should be added to the model to create a robust and safe estimation.
  • Acquisition (Before Purchase)
  • Property inspection
  • Environmental inspection
  • Closing costs at purchase (2-3%)
  • Loan origination fee from lender
  • Discount points on loans interest
  • Credit report fees
  • Appraisal fee (on 80 percent LTV value, not purchase price)
  • Mortgage insurance application costs
  • Mortgage broker fees
  • Real estate broker/agent fees
  • Real Estate taxes
  • Repair & Renovation
  • Flipping (Accounted for each month till sale)
  • Mortgage payment
  • Repairs & Remodeling costs
  • Think minor painting, cleaning, landscaping
  • Landscaping costs
  • Utilities
  • Insurance
  • Real Estate brokers fee at sale
  • Real estate taxes
  • Sale
  • Timing and Loss
  • If it takes 3 months to sell
  • If it takes 6 months, 12 months
Net Present Value (NPV) is sum of future cash flows minus the purchase price. Take the Time series of cash flows and discount them (expenses out and income) then add them up.  This is the Present value.
Subtract the purchase price and you get the Net Present Value.

NPV = sum[Fn/(1+i)n + … + Ft/(1+i)t] - Project Costs
It looks similar to PVF.  
  • F = Expenses or cash flow for that particular period of time.
  • (If under a year, use months)
  • n  = represents each period of time (2nd month would be a 2)
  • n = 1 for Starting Period (when you first purchased property)
  • Expenses or other due diligence costs can be added to purchase price to represent total project costs
  • i = it represents the Opportunity Cost of capital here.
  • Return expectations for project
  • Example
  • 5 month project multiple 20% yearly hurdle (5/12).
  • If you wish to make the hurdle (20% overall)
  • Divide hurdle by project months
  • 20% / 5 months
  • t = period sale takes place
  • Refine model by adding mortgage costs
  • Input monthly payments as expenses
  • Subtract payoff balance from sale

NPV formula example
  • Bought a $100K fixer-upper
  • 20% down ($20K)  
  • 3% closing costs ($3k)
  • $500 painting and landscaping costs
  • $700 per month mortgage and other costs
  • Sell it in 5 months for $135K
Month
1
2
3
4
5
Equity
($20,000)
0
0
0
0
Expenses
($4,200)
($700)
($700)
($700)
($86,700)
Income
0
0
0
0
$135,000
Total
($24,200)
($700)
($700)
($700)
$48,300
  • Closing costs and expenses add up to $6K

Compute NPV for totals, the result should be around $14,200.
It’s a Positive result and means that the project's return exceeds the investment requirements.
A good rule of thumb is to consider projects that produce a zero or positive NPV value. A NPV value of zero means the project meets your opportunity cost requirement.  NPV translates into a return percentage with the internal rate of return (IRR).
IRR is the value of i, the opportunity cost of capital that will cause the NPV to calculate to a zero value. Remember, if your project is month to month the IRR value is monthly.
In our example, IRR calculates to a project return of 17% per month.
Remember this is still only modeling a potential project, so reality could change the return.
If the investment hurdle for the project was 20-percent over five months, you would check to make sure the IRR exceeded (20-percent divided by 5 months) 4-percent per month.
The value of NPV and IRR calculation is when the time schedule changes. Stress testing or sensitivity testing are good things to add to the model. So if the project couldn’t sell for 12 months, keeping all the numbers the same would make a big impact.  Mortgage payment and expenses would pile up. Time value of money comes into play.
Running NPV calculation, you find the result is a NPV of around $9,140. The project still exceeds our investment goal, but the NPV value has dropped by $5,060 from your initial calculation of $14,200.
It is around a 36% loss in project value simply by extending the project 7 months. The IRR has also dropped from the initial 90-percent to 53-percent.  
Have a realistic schedule and stress test your models.
House Flipping Pro Forma
Category
Month 1
Month 2
Month 3
Month 4
Month 5
Acquisition
-
-
-
-
-
Property
($100,000)
-
-
-
-
Closing Costs
($6,000)
-
-
-
-
SUBTOTAL
($106,000)
-
-
-
-
Expenses
-
-
-
-
-
Renovations / Repairs
($500)
-
-
-
Mortgage
($525)
($525)
($525)
($525)
($525)
Lawn Maintenance
($75)
($75)
($75)
($75)
($75)
Utilities/Trash
($100)
($100)
($100)
($100)
($100)
SUBTOTAL
($700)
($1200)
($700)
($700)
($700)
Sale
-
-
-
-
-
Property Sale
-
-
-
-
$135,000
Sales Costs
-
-
-
-
($6,000)
CASH FLOW
($106,700)
($1,200)
($700)
($700)
$129,000
Project Returns
-
Profit
$19,700
OCC
20%
NPV
$1,100
IRR
20%
EQUITY RETURNS
-
Equity Cash Flow
($26,700)
($1,200)
($700)
($700)
$129,000
Mortgage Repayment
($80,000)
Cash Flow to Equity
($26,700)
($1,200)
($700)
($700)
$49,000
Equity Returns
-
Profit
$19,700
Req'd. Equity
($29,300)
OCC
20%
NPV
$11,700
IRR
70%
Leverage
Leverage is thought to be more of a Wall Street or fiancé feature, but it can exist in Real Estate and offer higher returns over all cash deals. If a project were an all cash deal in the example, $109,300 would be the cost.
So with our example, it would be a net $19,700 in profits. We will assume no time value of money for simplicity, so the ratio of profit to equity is around 18-percent.  The actual profit might be a bit higher because all cash deals mean no loan fees or mortgage payments.
However if you used Leverage, only $29,300 was put into the project and the rest was borrowed. So you’re leveraging $29,300 on a $109,700 project. And assuming all variables are the same, it’s the same net profit as all cash. But the ratio of profit to equity is now 67-percent, which is a much higher return.
Leverage increases the projects profit potential by a factor of almost four. But Leverage also comes with risks and can turn against you.
The pro forma can be thought of as a business plan for the project.  Before the pro forma, it was estimating and planning via your model. And with this model, you weren’t even looking at any physical properties.  
Leveraging this system means you build a model of a project that should offer a realistic representation of what project you should look for.  So if $30K was all that could be invested into a project, the models showed you are limited to properties under $100K.   You also learned that you would need to sell the property for $35K more than you paid for it (35% increase).  
So building upon this model, you will need to search and narrow to neighborhoods with an average value of $135K and a sales price of $100K.  The Model saves you time and shows you what you should look for, but you still need to run through a few stress tests and scenarios (using model and pro forma model) where it might take over a year to sell your property.   You need to find out how long do you have before you start losing money or use up any cash reserves.   Also play around with renovation costs.  Add Contingency lines for expenses (3-5%) if you think that might help.
  1. Artificial Neuron Network Models & Data Sets
The benefit to this system is that using ANNs, there is no need to assume explicit function or processing between inputs and outputs of the paper because ANNs learn directly from the observed data.

The ANN + Machine Learning algorithms + NLP system used in this paper has been trained with data gathered from the County of San Diego, which represents an expensive real estate market, but pockets of affordability.
In particular, San Diego is characterized as one of the worst cities to build wealth, one of the least affordable cities in the United States, and has a smaller job market when compared to Los Angeles, San Francisco, Seattle, or New York City.  

California, as an entirety, is also rated as one of the bottom in education and school spending per pupil.  However, San Diego is rated as one of the best places to retire, to work remote, and to lead a more laid back lifestyle.  
  1. Multilayer perceptron
MLP is a feed-forward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one
  1. M5 Model Trees
M5P is a reconstruction of Quinlan's M5 algorithm for inducing trees of regression models and combines a conventional decision tree with the possibility of linear regression functions at the nodes.  
  1. Machine Learning Models
  1. SVM
SVM is mainly used to solve the problems of classification of the samples of different categories and the regression of the samples. The classification problem mainly refers to seeking a hyperplane in the higher dimensional space to separate out the samples of different categories.
For SVM, the multiple classification can be solved via constructing two classifiers.
  1. K-Nearest Neighbors
  2. Linear Regression
  3. Partial Least Squares Regression
 PLS can find the best function matching with the original data accordingly to minimize the sum of the squares of error. Although the independent variables have multiple correlation, All of the independent variables will be contained in the final model of PLS regression. And maximum information will be extracted from the original data, which ensures the accuracy of the model.
  1. Natural Language Processing Models
Modern NLP algorithms are based on machine learning, especially statistical machine learning.  Many different classes of machine learning algorithms have been applied to NLP tasks. These algorithms take as input a large set of "features" that are generated from the input data. Some of the earliest-used algorithms, such as decision trees, produced systems of hard if-then rules similar to the systems of hand-written rules that were then common. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to each input feature.
Such models have the advantage that they can express the relative certainty of many different possible answers rather than only one, producing more reliable results when such a model is included as a component of a larger system.
  1. Results  & Discussion
  1. Conclusion
References
Jingyi MuFang Wu, and Aihua Zhang (2014)
Housing Value Forecasting Based on Machine Learning Methods
Acciani, Claudio et al (2008) - Model Tree: An application in real estate appraisal  
Limsombunchai, Visit et al (2004) - House Price Prediction: Hedonic Price Model vs. Artificial Neural Network
McCluskey, William and Anand Sarabjot (1999) - The application of intelligent hybrid techniques for the mass appraisal of residential properties
Peterson, Steven - Neural Network Hedonic Pricing Models in Mass Real Estate Appraisal
Van Wezel ,Michiel et al (2005) - Boosting the Accuracy of Hedonic Pricing Models
realmarkits.com (2014)

Wednesday, October 12, 2016

Getting there but...

http://www.odditycentral.com/news/russian-programmer-ressurects-deceased-best-friend-as-an-ai-chatbot.html

Thursday, October 6, 2016

Data science for IoT


What is Data Science for
the Internet of Things (IoT) ?  

Some good details  and framework here

Saturday, October 1, 2016

Fortifying the Cyber Frontier: Safeguarding LLMs, GenAI, and Beyond

In the ever-evolving world of cybersecurity and infosec, the convergence of cutting-edge emerging technologies like Large Language Models (L...