Training

General stuff

All of our trainings are being held by Sönke Liebau or Lars Francke. For the hands-on parts we're working with a realistic cluster in the cloud and not with VMs on your laptops. All you need to participate in the hands-on exercises is a Laptop that can connect to our WiFi.

We offer our trainings in-house as well as publicly. Our fees are transparent and easy:

  • In-house courses are 4.000€ (net) per day (plus expenses) for up to ten participants, each further participant is 200€ a day extra
  • Public courses are 600€ (net) per day and participant
  • Our public courses run as soon as three people have signed up, there is a maximum of twelve per course
  • For the in-house courses it's up to you how many people you'd like to have per course, but we recommend a maximum of 15 per course so we can look after each participant

Why chose OpenCore for your trainings?

  • Our trainers: We (Sönke & Lars) have been in the Big Data & Hadoop field for years (~10) and are active in lots of the projects that we talk about (e.g. Lars is an HBase & Hive committer). We're also at customers and work with the tools almost daily
  • Our customers: We have lots of happy customers have built relationships over the years throughout all industries and countries of the world
  • Up to date: The field is fast moving and we invest lots of time to keep our courses and knowledge up to date Independent: We're not tied to a specific distribution and can talk about the pros & cons of all the tools including stuff that's not in the distros

In-house or public?

If you want a training for your team, we recommend doing it in-house This allows us to be flexible with the contents and the agenda according to your in-house wishes and technologies This way we can also talk about internal details that you can't talk about in public

Cancellation fees

Cancellation fees:

  • If you cancel more than two weeks before the training, we'll bill you 10% of the costs plus all non-refundable expenses
  • If you cancel less than two weeks before the training, we'll bill you 50% of the costs plus all non-refundable expenses
  • If you cancel on the day of the training, we'll bill you the full amount plus all non-refundable expenses

What we offer

  • On request: Certificate of participation
  • The slides and exercices as PDF files
  • On request: You can get updated material for another six months after the training
  • For all further trainings that you book within a year of the previous one we'll give you a discount of 10%

Customized Trainings

The trainings we offer cover a broad spectrum of topics. But we're fully aware that for you and your use-case a different combination of topics might be even more interesting or economical. Therefore, we're more than happy to prepare and deliver a custom training for your needs. Just contact us if you're interested.

Requirements for participants

  • Laptop with WiFi for the exercises ** possibly a Java IDE (e.g. IntelliJ IDEA)
  • Curiosity and interest in the topic

Requirements for the room for in-house trainings

  • Table with power for each participant
  • Projector
  • Internet (WiFi or cable is fine, we'll bring a WiFi-Repeater)
  • Whiteboard
    • incl. marker and eraser
    • alternatively, a flipchart
  • Drinks and food or a plan for lunch

Portfolio

Hadoop & Big Data Basics (1-3 days)

In this training we'll teach you about the Big Data ecosystem around the Apache Hadoop project. Hadoop is the worlds most used framework in the big data space. It consists of three components: Distributed storage of data, distributed computations and managing of computer resources. But that's not all: The ecosystem contains hundreds of tools and frameworks. The most important ones we'll present in this training. The training is independent of a specific distribution.

The hands-on part consists of simple commands to access HDFS as well as Spark jobs and SQL on Hadoop using Hive.

We offer this training with a length of 1-3 days:

  • In one day we'll present Hadoop itself in detail but will only talk briefly about all the other tools
  • In two days we have more time for questions and the other tools from the ecosystem
  • In three days we can talk about things like administration or security as well as more extensive exercises. Also, an introduction to both big distributions (Cloudera CDH, Hortonworks HDP) is possible

Agenda

  • Big Data Introduction, short history
  • Apache Hadoop introduction (HDFS, YARN, MapReduce)
  • Introduction to the ecosystem (e.g. HBase, Spark, Kafka, Solr uvm.)
  • Optional: Administration and Security

HBase (3 days)

Training to go with the book! We'll introduce you to HBase and will present the contents of the book "HBase: The Definitive Guide (Second Edition)" which was written by our partner Lars George. We also have hands-on exercises to help you test what you've learned. This also includes tests for the high-availability.

Training is delivered by a HBase committer.

Agenda

  • Basics: What is HBase and what can I use it for?
  • How do I install it and what do I have to look out for when deploying it?
  • Client API: How do I access HBase as a user (and administrator)
  • Extended Features: Filters, Counter, Coprocessors
  • Alternative Clients (REST, Thrift etc.)
  • Hadoop Integration
  • Internals Deep-Dive: How does HBase work?
  • This can be important to know if you want to use HBase optimally
  • Data Modeling/Architecture with HBase: Key Design etc.
  • Monitoring & Tuning
  • Administration (Backup, Replication, Decommissioning etc.)
  • Security

Data Ingestion (2 days)

We expect that you know the contents of the Hadoop & Big Data Basics training.

In two days we'll teach you the tools, frameworks and typical patterns used to load data into a Hadoop cluster.

We show you the most often used tools across distributions: Kafka (Streams & Connect), Spark (Streaming, Structured Streaming), HBase, Hive/Impala, HDFS, Sqoop, Oozie, Flume, NiFi. For each of these tools we'll talk about strengths and weaknesses as well as their position in the market and state of the distributions. Afterwards we'll teach you typical patterns of Data Ingestions in the Big Data/Hadoop world und will build a few simple flows in our hands-on exercises.

Kafka (2 days)

In this training we'll tech you the basic components and principles of Kafka, what you need to know about its administration and an introduction to the various tools from its ecosystem. We'll also talk about the differences between Apache Kafka and the Confluent Platform.

You'll learn your way around the command line interface to manage a cluster as well as best practices for your daily business. The participants will learn the theory of Kafka in-depths but will also be able to apply what they've learned on a real cluster. After we've learned the basics you'll get an overview of the existing 3rd party tools to ease the burden of administration and the possibilities of monitoring Kafka.

In addition to Kafka itself there are various tools in the ecosystem that are often used together. The course introduces you to the following projects in theory and practice:

  • Kafka Connect
  • Schema Registry
  • Rest Proxy
  • Kafka Streams & KSql

For the hands-on part we'll use the Confluent Platform, but we'll mention the relevant differences to the Apache Kafka project.

Big Data Dinner

There are many stories, misconceptions and myths around "Big Data" that often lead to failed or delayed projects. Projects that don't deliver what you'd hope or run over budget. This casual event is meant for managers that get to make decisions about Big Data projects at their companies. Our goal is to teach you the basics on what's possible today with "Big Data" (and what's not) as well as clear up a few of the common misconceptions.

The participants will learn everything they need to learn so that they can ask the right questions at the right time and to steer projects in the right direction.

Also - not unimportant - you'll get to have good food and the chance to network with your "comrades in misery".