Big Data has been all the craze. Business, marketing and project managers like it because they can plot out trends to make decisions. To us developers, Big Data is just a bunch of logs. In this blog post, I would like to point out that Big Data (or logs with context) can be leveraged by development teams to understand how our APIs are used.
Developers have implemented logging for a very long time. There are transaction logs, error logs, access logs and more. So, how has logging changed today? Big Data is not all that different from logging. In fact, I would consider Big Data logs as logs with context. Context allows you to do perform interesting things with the data. Now, we can correlate user activity with what’s happening in the system.
A Different Type of Log
So, what are logs? Logs are record of events, and frequently created in the case of applications with very little user interaction. It goes without saying that many logs are transaction logs or error logs.
However, there is a difference between forensics and business logs. Big Data is normally associated with events, actions and behaviors of users when using the system. Examples include records of purchases, which are linked to a user profile and spanned across time. We call these business logs. Data and business analysis would love to get a hold on this data; run some machine learning algorithms and finally predict the outcome of a certain decision to improve user experience.
Now back to the developer. How does Big Data help us? On our end, we can utilize forensics logs. Logs get more interesting and helpful if we can combine records from multiple sources. Imagine; hooking in and correlating IIS logs, method logs and performance counters together.
Big Data for Monitoring and Forensics
I would like to advocate that Big Data can and should be leveraged by web service developers to:
- Better understanding the system and improve performance of critical paths
- Investigate failure trends which might lead to errors or exacerbate current issues.
Logs can include:
- Method calls (including context of call – user login, ip address, parameter values, return values etc.)
- Execution time of method
- Chain of calls (e.g. method names, server names etc.)
This can be used to trace where method calls originate
With the various data being logged for every single call, it is important that the logging system is able to hold and process huge volume of data. Big Data has to be handled on a whole different scale. The screenshots below are charts from Kibana. Please refer here to find out how to set up data collection and dashboard display using this suite of open source tools.
Based on the decision as to what kind of monitoring is required, the relevant information (e.g. context, method latency, class/method names) should be included in Big Data logs.
Detecting Problematic Dependencies
Plotting time spent in classes of incoming and outgoing components provides us with visibility into the proportion amount of time spent in each layer of the service. The plot below revealed that the service was spending more and more time in a particular component; thus warranting an investigation.
Discovering Faulty Queries
Logging all exceptions, together with the appropriate error messages and details, allows the developers to determine the circumstances under which a method would fail. The plot below shows that MySql Exceptions started occurring at 17:30. Due to the team including parameters within logs, we were able to determine that invalid queries were used (typos and syntax errors).
Determine Traffic Pattern
Tapping into the IP address of incoming request reveals very interest traffic patterns. In the example below, the graph indicates a spike in traffic. However, upon closer look, this graph shows that spike spanned across ALL countries. This concludes that this spike in traffic is not due to user behavior and this leads to further investigation other possible causes (e.g., DOS attacks, simultaneous updates for mobile apps, error in logs etc.) In this case, we found out it was a false positive; repeated reads in log forwarders through the logging infrastructure.
Determine Faulty Dependents (as opposed to dependencies)
Big Data log generations can be enhanced to include IDs to track the chain of service calls from clients through to the various services in the system. The first column below indicates that traffic from the iOS mobile app passes through the External API gateway before reaching our service. Other columns indicate different flows, thus allowing developers enough information to detect and isolate problems to different systems if needed.
Tracking Progression Through Various Services
Ancestry.com has implemented a Big Data framework across all services to support call tracking across different services. This helps developers (who are knowledgeable on the underlying architecture) to debug whenever a scenario doesn’t work as expected. The graph below depicts different methods being exercised across different services, where each color refers to a single scenario. Such data provides full visibility on the interaction amongst different services across the organization.
Forensic logs can be harnessed and used with Big Data tools and framework to greatly improve the effectiveness of development teams. By combining various views (such as the examples above) into a single dashboard, we are able to provide developers with a health snapshot of the system at any time in order to determine failures or to improve architectural designs.
By leveraging Big Data for forensics logging, we, as developers are able to determine faults and reproduce errors messages without the conventional debugging tools. We have full visibility into the various processes in the system (assuming we have sufficient logs). Gone were the days when we need to instrument code on LIVE boxes because the issue only occurs in the LIVE environment.
All of these work are done independently of the Business Analysts and are in fact, very crucial to the agility of the team to quickly react to issues and to continuously improve the system.
Do your developers use Big Data as part of daily development and maintenance of web services? What would you add to increase visibility in the system and to reduce bug-detection time?