Forecasting Earning Surprises with Machine Learning

How to predict which companies will Beat or Miss their Analyst Earnings Estimates

Listed companies produce quarterly earnings reports which can cause significant price movements when the results deviate from what the analysts had estimated. This is because according to the Efficient-market hypothesis, asset prices fully reflect all available information and will as a result factor in consensus estimates. In this article we are going to see how we can use Machine Learning to predict whether a company will beat or miss its estimates.

The Data

We consider EPS analyst estimates from the Thomson Reuteurs I/B/E/S Estimates database and was downloaded from Sentieo. The database gathers and compiles the estimates made by analysts for more than 20 measures. For each company, we are given the Mean, #Estimates, Low, High and Actual values of the estimates as shown bellow:

Unfortunately, with this database we only have 70 data points per company, which isn’t enough to predict the earnings of one company based on previously announced results and their Beat/Miss vs estimates, but we can instead reframe the problem to increase the number of data points.

Instead of asking ourselves whether a company will beat or miss the estimates, we can ask whether the estimates will be higher or lower than the actual values.

We will then normalise the values in order to aggregate them. In this case the features we will be considering for our model are:

  • # Estimates
  • Low/Mean %
  • High/Mean %
  • Actual/Mean %

We then decided to aggregate the estimates by sector in order to test the hypothesis that the analysts (in)ability to forecast earnings accurately would be tied to the nature of the firms. For this study we are going to focus on healthcare stocks.

We then took over 6000estimates for the following 117 companies:


Processing the Data

We uploaded the data to AuDaS, a Data Science and education platform built for Analysts by Mind Foundry. In order to improve the accuracy of the model, we created a new column which represented whether the actual value was higher (1) or lower (-1) than the actual value (as opposed to a %).

Source link
Show More
Back to top button

Pin It on Pinterest