Parallel access to external data sources from Greenplum DB using PXF

Parallel accessGreenplum 5.0.0 brought us a lot of new features. Most of them were planned a long time ago, but couldn't be implemented without breaking backward binary compatibility, which cannot be done in 4.X major branch. One of such features is a new PXF framework. It allows you to integrate Greenplum cluster with other systems - databases, in-memory grids, Hadoop components, etc. Moreover, it can do it in parallel - all Greenplum segments can retrieve its personal shards of data.

Apache Zeppelin vs Jupyter Notebook: comparison and experience

MPP monitoringThe more you go in data analysis, the more you understand that the most suitable tool for coding and visualizing is not a pure code, or SQL IDE, or even simplified data manipulation diagrams (aka workflows or jobs). From some point you realize that you need a mix of these all – that’s what “notebook” platforms are. I have tried two most powerful of them in production use with about 20+ analytic users. My experience is described in this article.

Monitoring MPP systems

MPP monitoringThere are a lot of monitoring systems nowadays, but working with Massively Parallel Processing (MPP) databases showed me that they are not enough to monitor complex data processing systems from both sides - data and hardware. For that purposes I found solution in combining multiple metric collecting, visualizing and alerting systems.