Pull to refresh
157.52

System Analysis and Design *

Analyze and project

Show first
Rating limit
Level of difficulty

Tarantool: an analyst's view

Reading time8 min
Views1.9K
Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products. 

Many system analysts and engineers are keen to know: 

  1. How to design the architecture of a trigger platform for real-time marketing?
  2. How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
  3. How to ensure the stable operations of the  system under very heavy workloads? 

Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market.  I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.
Read more →
Total votes 26: ↑26 and ↓0+26
Comments0

Making a Tarantool-Based Investment Business Core for Alfa-Bank

Reading time10 min
Views1.8K

A still from «Our Secret Universe: The Hidden Life of the Cell»

Investment business is one of the most complex domains in the banking world. It's about not just credits, loans, and deposits — there are also securities, currencies, commodities, derivatives, and all kinds of complex stuff like structured products.

Recently, people have become increasingly aware of their finances. More and more get involved in securities trading. Individual investment accounts have emerged not so long ago. They allow you to trade in securities and get tax credits or avoid taxes at the same time. All clients coming to us want to manage their portfolios and see their reporting on-line. Most frequently, these are multi-product portfolios, which means that people are clients of different business areas.

Moreover, the demands of regulators, both Russian and international, also grow.

To meet the current needs and lay a foundation for future upgrades, we've developed our Tarantool-based investment business core.
Read more →
Total votes 14: ↑14 and ↓0+14
Comments0

Qrator filtering network configuration delivery system

Reading time6 min
Views1.3K


TL;DR: Client-server architecture of our internal configuration management tool, QControl.
At its basement, there’s a two-layered transport protocol working with gzip-compressed messages without decompression between endpoints. Distributed routers and endpoints receive the configuration updates, and the protocol itself makes it possible to install intermediary localized relays. It is based on a differential backup (“recent-stable,” explained further) design and employs JMESpath query language and Jinja templating for configuration rendering.

Qrator Labs operates on and maintains a globally distributed mitigation network. Our network is anycast, based on announcing our subnets via BGP. Being a BGP anycast network physically located in several regions across the Earth makes it possible for us to process and filter illegitimate traffic closer to the Internet backbone — Tier-1 operators.

On the other hand, being a geographically distributed network bears its difficulties. Communication between the network points-of-presence (PoP) is essential for a security provider to have a coherent configuration for all network nodes and update it in a timely and cohesive manner. So to provide the best possible service for customers, we had to find a way to synchronize the configuration data between different continents reliably.
In the beginning, there was the Word… which quickly became communication protocol in need of an upgrade.
Read more →
Total votes 24: ↑23 and ↓1+22
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 5

Reading time8 min
Views1K


This is the final part of the series describing how we’re increasing our service availability in Citymobil (you can read the previous part here). Now I’m going to talk about one more type of outages and the conclusions we made about them, how we modified the development process, what automation we introduced.
Read more →
Total votes 24: ↑24 and ↓0+24
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 4

Reading time7 min
Views1K


This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here: part 1, part 2, part 3). In further parts, I’ll talk about the accidents and outages in detail.

1. Bad release: database overload


Let me begin with a specific example of this type of outage. We deployed an optimization: added USE INDEX in an SQL query; during testing as well as in production, it sped up short queries, but the long ones — slowed down. The long queries slowdown was only noticed in production. As a result, a lot of long parallel queries caused the database to be down for an hour. We thoroughly studied the way USE INDEX worked; we described it in the Do’s and Dont’s file and warned the engineers against the incorrect usage. We also analyzed the query and realized that it retrieves mostly historical data and, therefore, can be run on a separate replica for historical requests. Even if this replica goes down due to an overload, the business will keep running.
Read more →
Total votes 17: ↑16 and ↓1+15
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 3

Reading time8 min
Views1.1K


This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here and here). In further parts, I’ll talk about the accidents and outages in detail. But first let me highlight something I should’ve talked about in the first article but didn’t. I found out about it from my readers’ feedback. This article gives me a chance to fix this annoying shortcoming.
Read more →
Total votes 23: ↑23 and ↓0+23
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 2

Reading time8 min
Views979


This is a second article out of a series «Citymobil — a manual for improving availability amid business growth for startups». You can read the first part here. Let’s continue to talk about the way we managed to improve the availability of Citymobil services. In the first article, we learned how to count the lost trips. Ok, we are counting them. What now? Now that we are equipped with an understandable tool to measure the lost trips, we can move to the most interesting part — how do we decrease losses? Without slowing down our current growth! Since it seemed to us that the lion’s share of technical problems causing the trips loss had something to do with the backend, we decided to turn our attention to the backend development process first. Jumping ahead of myself, I’m going to say that we were right — the backend became the main site of the battle for the lost trips.
Read more →
Total votes 23: ↑22 and ↓1+21
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 1

Reading time4 min
Views1.3K


In this first part of an article series «Citymobil — a manual for improving availability amid business growth for startups» I’m going to break down the way we managed to dramatically scale up the availability of Citymobil services. The article opens with the story about our business, our task, the reason for this task to increase the availability emerged and limitations. Citymobil is a rapid-growing taxi aggregator. In 2018, it increased by more than 15 times in terms of number of successfully completed trips. Some months showed 50% increase compared with the previous month.

The business grew like a weed in every direction (it still does): there was an increase in server load, team size and number of deployments. At the same time the new threats to service availability emerged. The company faced a task of the most importance — how to increase availability without compromising company growth. In this article, I’ll talk about the way we managed to solve this task in a relatively short time.
Read more →
Total votes 24: ↑24 and ↓0+24
Comments0

Crystal Blockchain Analytics: Investigating the Hacks and Theft Cases

Reading time8 min
Views2.7K
In this report, Bitfury shares analysis completed by its Crystal Blockchain Analytics engineering team on the movement of bitcoin from the Zaif exchange, Bithumb exchange and Electrum wallets.

Read more →
Total votes 15: ↑13 and ↓2+11
Comments0

Generic Methods in Rust: How Exonum Shifted from Iron to Actix-web

Reading time13 min
Views5.9K
The Rust ecosystem is still growing. As a result, new libraries with improved functionality are frequently released into the developer community, while older libraries become obsolete. When we initially designed Exonum, we used the Iron web-framework. In this article, we describe how we ported the Exonum framework to actix-web using generic programming.

Read more →
Total votes 28: ↑27 and ↓1+26
Comments0

Automation VS Chaos

Reading time5 min
Views1.1K
image

IT technologies evolution allowed to control huge data flows. Business has a lot of IT solutions: CRM, ERP, BPM, accounting systems or at least just Excel and Word. Companies are different too. Some of companies are composed of plenty branches. Let’s name such as “Pyramid”. Pyramids have data synchronization issue for pile of IT systems. Software vendors and versions differ for branches significantly. In addition management company continuously modify reporting requirements that causes frustration assaults in the branches. This is a story about the project I happened to encounter chaos that needed to be systematized and automated. Low budget and tight deadlines limited the use of most existing industrial solutions but opened up scope for creativity.
Читать дальше →
Total votes 12: ↑12 and ↓0+12
Comments0

How to Painlessly Unite Art with Java, JavaScript, and Graphs or The Story Behind Creating an Interactive Theatre Produc

Reading time9 min
Views1.4K
Last year 2018, a theatre production series called Tale of the Century was launched in Estonia. Throughout the year, 22 local theatres presented their interpretations of the past hundred years of Estonian history to the audiences. In the draw, the Russian Theatre was assigned the topic of the future of Estonia.

Total votes 18: ↑17 and ↓1+16
Comments2

What to think during NALSD interview

Reading time7 min
Views9.1K
There are a lot of posts about what a typical coding interview at Google looks like. But, while not as widely described and discussed, there is also quite often a system design interview. For an SRE position it’s NALSD: non-abstract large system design. The key difference between SWE and SRE interviews consists in these two letters: NA.

So, what is the difference? How to be prepared for this interview? Let’s be non-abstract, and use an example. To be more non-abstract, let’s take something from the material world, such that you won’t be asked the exact same thing at the real interview (at least, not at the Google interview) :)

So, let’s design a public library system. For the paper books, like you have seen everywhere around. The whole text below was written all at once within around one hour, to roughly show you the areas that you should be able to cover / touch during the interview. Please excuse some disorder, that’s how I think (therefore I am).
Read more →
Total votes 26: ↑24 and ↓2+22
Comments0

Authors' contribution