In this series of articles, I’m going to describe a low-latency risk assessment & mitigation platform built on top of a relational SQL database. The purpose of our setup is to provide a live query subsystem capable of handling large quantities of requests whilst keeping load bursts isolated from the main operational cluster.
The information we’re going to work with is going to be a common fintech dataset, revolving around financial transactions and risk ratings of individuals. It could be utilized in a variety of applications, including:
- Line of credit products
- Insurance platforms
- Credit card issuers
- Installment loan lenders
- Lead processors / verification centers
Before we dive deep into the technical aspects, let’s briefly discuss the nature of our queries, which could be described by the following characteristics:
- Our system needs to deliver answers rapidly in order to be used as a part of synchronous data processing. The fintech product we offer requires an unambiguous response from our platform in order to deliver its core functionality. Increased latency will result in degraded performance of our product, while lack of response will cause processing errors.
Example: if the Risk Mitigation queries are used to approve or deny credit card purchases, excessive processing times might lead to timed out transactions.
- We want to rely on our existing data rather than on 3rd party decision-making. If a 3rd party service is used, it should only be considered an extra input for our own analysis. Our system needs to be capable of self-assessing the importance (weight) of each parameter being measured.
Example: if a Credit Rating agency data is used to evaluate a potential borrower, their scoring should only be considered one of the many factors being analyzed. By retrospective analysis of our existing loan book performance and historical Credit Ratings of its borrower, we should be able to benchmark this particular agency’s credibility.
- Our solution needs to self-improve by acquiring more live operations data on a regular basis. It also must be able to identify insufficient information and auto-tune itself by supplementing alternative data sources.
Example: When evaluating new customers, a health insurance company looks at their age as a main contributing factor to assess their risk. As they decide to introduce new policies targeting pensioners for the first time, their existing portfolio will not have enough data to provide meaningful population samples. Another set of parameters has to be processed in order to provide risk mitigation intelligence for this new group of customers.
I am going to describe a software set used to achieve the goals outlined above in a series of short notes, each one focused on an individual aspect of the setup:
- Relational Database
- OLAP vs Deep Learning
- Dynamic Mondrian Cubes
- Implementation Constraints (ORM design, conflict monitoring)