Transforming a Regulatory Enterprise to Big Data and Cloud
30 BILLION EVENTS PER DAY
The business world feels like it’s in the grip of a technology revolution, but this is a misconception. In fact there are two complementary parallel revolutions in play.
The first of these revolutions is big data. New horizontally scalable architectures make it feasible to use commodity hardware to rapidly process virtually unlimited volumes of data.
The Financial Industry Regulatory Authority (FINRA) performs large scale big data analytics. FINRA is an independent securities regulator authorized by Congress to promote investor protection and market integrity. FINRA oversees U.S. securities firms and their registered representatives and monitors trading in the U.S. markets. As part of this market monitoring activity, FINRA processes approximately 30 billion market events every day to build a holistic picture of trading in the U.S. The resulting picture is subjected to an extensive library of surveillance scenarios to identify potential insider trading and market manipulation.
The technical challenges are aggressive:
• Market volumes are volatile and steadily increasing;
• Securities exchanges are dynamically evolving;
• New regulatory rules are created and enhanced;
• New securities products are regularly developed and introduced and
• Market manipulators are innovating.
And as technologies like ‘Hadoop’ have matured, it has become clear that they can provide FINRA with a strategic advantage in meeting these challenges. Large scale use of data technologies requires massive clusters of servers and comprehensive automation of server management and provisioning to use them effectively.
Third party Cloud vendors provide economies of scale in hardware provisioning. However, the best also provide high grade security controls, encryption and other platform services that reduce development and operating costs.
The key economic benefits for a big data customer are obtained from elasticity and automation. The elasticity comes from provisioning only the required amount of infrastructure and then releasing those resources (thus saving costs) when the need has passed. Automation comes from using platforms to script maintenance and production support tasks that would otherwise require manual intervention.
DEVELOPING EXPERTISE AND ACCEPTANCE
FINRA began with a thorough understanding of the business domain and the technical nature of the problem yet to be solved. We had already implemented a sophisticated system in our private data center using data processing appliances, various SAN and NAS technologies, Hadoop clusters and both virtual and physical server farms.
What was needed was knowledge of the trade-offs posed by the new cloud and big data technologies, so that we could architect and implement an effective solution. This was done by creating a skunk work project staffed by influential managers and key technical personnel, operating according to defined principles:
• All initial learning was to be hands-on to the degree possible, with a minimal reliance on pundits’ articles and whitepapers;
• short daily meetings would review what had been learned and next steps;
• as the executive responsible, I made a point of attending all meetings; and
• business and operational requirements were excluded from consideration to avoid the trap of setting conservative goals.
Amazon Web Services (AWS) was an obvious choice for initial proof of concept projects, since they provided the most flexible offering and the lowest barriers to entry.
We started with an extremely small team which we incrementally grew from week-to-week until the team grew to a critical mass required to effect enterprise-wide change. In doing so, excitement and enthusiasm mounted in the organization as more knowledge was gained about the tangible capabilities of big data and cloud and more staff became involved in the project.
The knowledge gained from the early proof-of-concept (POC) projects allowed us to draft a realistic prospective architecture which was used to identify specific risk areas, primarily pertaining to performance and security. With these risks identified and at least partially mitigated, it was possible to create a solid business case. The work done previously provided the confidence to build a well-performing system for a reasonable cost.
“As part of this market monitoring activity, FINRA processes approximately 30 billion market events every day to build a holistic picture of trading in the U.S”
When it came time to select specific cloud vendors, technical staff had had extensive hands-on experience with AWS and was very conversant with big data trade-offs. Meetings with alternate vendors moved rapidly past the pre-sales and PowerPoint presentations and went to hands-on evaluations. The test suites and prototypes that had been developed initially on AWS were ported to other candidate cloud platforms allowing conclusions to be rapidly reached.
AWS platforms allow real-time provisioning of resources under application control. This means that when new data arrives for processing, the application can recognize its arrival, connect it with previously received related data, spin up a cluster of temporary servers to process the incoming data, deposit the result in a fault-tolerant repository and decommission the temporary servers, all with no human intervention.
Several AWS products are fundamental to enabling this approach. The first is S3, which provides a single contiguous highly redundant data store. The contiguous and high bandwidth nature of S3 makes it very suitable for efficient big data processing. The EMR platform provides for dynamic clusters to be commissioned and decommissioned readily under application control. The near line storage platform, Glacier, provides programmatic restoration of older data from a multi-petabyte data set for retrospective processing. The EC2 server farms provide for efficient loading and operating of Hadoop and HBase clusters.
AWS platforms are “open source friendly.” Open source software is key to our strategy in the rapidly evolving big data solutions environment because it gives us portability and innovation. The first reduces our dependence on a single vendor, and the second keeps us current with technological advances without significant internal investment.
CHANGING WITH THE MARKET
Big data approaches like Hadoop makes us able to scale to whatever volumes are required. Doing this on AWS’ cloud gives us affordable, secure and elastic access to variable computing power and data storage as required. Using open source software ensures that we can remain vendor agnostic and benefit from innovation in the big data space. Finally, application controlled data management and resource provisioning ensure cost efficient scaling with stock market growth and innovations.