Data Engineering *

discuss data collection and preparation

Articles Posts News Authors

lukyanchikov Sep 20 2020 at 23:33

InterSystems IRIS – the All-Purpose Universal Platform for Real-Time AI/ML

22 min

952

InterSystems corporate blogMachine learning*DevOps*Artificial IntelligenceData Engineering*

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

Challenges of real-time AI/ML computations

We will start from the examples that we faced as Data Science practice at InterSystems:

A “high-load” customer portal is integrated with an online recommendation system. The plan is to reconfigure promo campaigns at the level of the entire retail network (we will assume that instead of a “flat” promo campaign master there will be used a “segment-tactic” matrix). What will happen to the recommender mechanisms? What will happen to data feeds and updates into the recommender mechanisms (the volume of input data having increased 25000 times)? What will happen to recommendation rule generation setup (the need to reduce 1000 times the recommendation rule filtering threshold due to a thousandfold increase of the volume and “assortment” of the rules generated)?
An equipment health monitoring system uses “manual” data sample feeds. Now it is connected to a SCADA system that transmits thousands of process parameter readings each second. What will happen to the monitoring system (will it be able to handle equipment health monitoring on a second-by-second basis)? What will happen once the input data receives a new bloc of several hundreds of columns with data sensor readings recently implemented in the SCADA system (will it be necessary, and for how long, to shut down the monitoring system to integrate the new sensor data in the analysis)?
A complex of AI/ML mechanisms (recommendation, monitoring, forecasting) depend on each other’s results. How many man-hours will it take every month to adapt those AI/ML mechanisms’ functioning to changes in the input data? What is the overall “delay” in supporting business decision making by the AI/ML mechanisms (the refresh frequency of supporting information against the feed frequency of new input data)?

varunbhagat1 Aug 31 2020 at 10:22

Data Science vs AI: All You Need To Know

4 min

.NET*Angular*DevOps*Artificial IntelligenceData Engineering*

What do these terms mean? And what is the difference?

Data Science and Artificial Intelligence are creating a lot of buzzes these days. But what do these terms mean? And what is the difference between them?

While the terms Data Science and Artificial Intelligence (AI) comes under the same domain and are inter-connected to each other, they have their specific applications and meaning.

There’s no slowing down the spread of AI and data science. Many big tech giants are extensively investing in these technologies. As per the recent survey, it is estimated that artificial intelligence could add $15.7 trillion to the global economy by 2030.

Through this piece of writing, I will be explaining about the AI and data science concepts and their differences in detail. So, without wasting any more time, let’s get started!

PastorGL Jan 31 2020 at 19:55

Introducing One Ring — an open-source pipeline for all your Spark applications

23 min

1.5K

Open source*Java*Big Data*Hadoop*Data Engineering*

If you utilize Apache Spark, you probably have a few applications that consume some data from external sources and produce some intermediate result, that is about to be consumed by some applications further down the processing chain, and so on until you get a final result.

We suspect that because we have a similar pipeline with lots of processes like this one:

A process flowchart with more than 50 applications and about 70 datasets
Click here for a bit larger version

Each rectangle is a Spark application with a set of their own execution parameters, and each arrow is an equally parametrized dataset (externally stored highlighted with a color; note the number of intermediate ones). This example is not the most complex of our processes, it’s fairly a simple one. And we don’t assemble such workflows manually, we generate them from Process Templates (outlined as groups on this flowchart).

So here comes the One Ring, a Spark pipelining framework with very robust configuration abilities, which makes it easier to compose and execute a most complex Process as a single large Spark job.

And we just made it open source. Perhaps, you’re interested in the details.

We got you covered!