Clickhouse Metabase

Here's what people are saying about Metabase. Super impressed with @metabase! We are using it internally for a dashboard and it really offers a great combination of ease of use, flexibility, and speed. Paavo Niskala (@Paavi) December 17, 2019. @metabase is the most impressive piece of software I’ve used in a long time. ClickHouse also supports: Parametric aggregate functions, which accept other parameters in addition to columns. Combinators, which change the behavior of aggregate functions.

ClickHouse is known as a data analytics processing engine. ClickHouse is one of the open-source column-oriented database-oriented management systems capable of real-time generation of analytical data reports using SQL queries.

Clickhouse came a long way since it inception 3 years ago.

Why Mydbops recommends ClickHouse for Analytics ?

ClickHouse is a Columnar Store built for SORT / SEARCH queries performance on a very large volume of database.

In Columnar Database Systems the values from different columns are stored separately, and data from the same column is stored together – Benefits Performance of Analytical Queries (ORDER / GROUP BY & Aggregation SQLs).

Columnar Stores are best suited for analytics because of their ability to retrieve just those columns instead of reading all of the rows and filter out unneeded data makes the data accessed faster.

Easy integration with MySQL and other DB engines. ( MySQL and Clickhouse data migration )

Need for Backup and Restore:

Metabase Sqlite

As a DBA responsibility, we have to backup the data regularly for security reasons.

If the database crashes or some fatal errors happen, backup is the only way to restore the data and to reduce the loss to the minimum.

There are multiple ways of taking backup. but they all have their own shortcomings. We will be discussing about the below two methods and how to perform the backup and restoration with the below two methods.

Clickhouse Client
Clickhouse backup tool

Method 1 ( Using ClickHouse Client ):

ClickHouse Client is a simple way to backup the data and restore it in ClickHouse without any additional tooling. We are going to make the backup of metadata and data separately here

Metadata Backup:

In example, I am taking the dump of the structure of the table “test_table” from the database “testing” with the TAbSeparatedRaw format. This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table). ( i.e ) Rows are written without escaping.

MetadataRestore:

I have created the database named “testing1” and trying to restore metadata backup taken earlier.

Restoring backup :

Metadata Validation :

Here is the same comparison of the table Structure from dump file and restored data :

Data Backup :

Before taking the dump of the data, Let us validate the count of records that are going to backup.We can validate the records by making a count.

Here I’m taking the dump of the table “test_table” with TabSeparated format (tsv). In a tab-separated (tsv) format, data is written by row. Each row contains the values separated by tabs. Values are written in text format, without enclosing quotation marks, and with special characters escaped.

Data Restore :

We need to ensure the database and the table (metadata) is created. The table format should be the same as the source table format. The meta data is restored and data dump is restored.

Once the data dump is restored, I have cross checked the count of the data which is restored in the database from the dump file

We can make an automated program to make the metadata and data backup of each table. And the recovery also has to be formulated too.

Method 2 (Clickhouse-backup):

Clickhouse-backup tool for easy backup and restore with S3 (AWS) and GCS support. It is an open source tool which is available on git.

Features

Supports Full and incremental backups.
Supports AWS, GCS, and Alibaba cloud object stores.
Ease of configuration with environment variables.
Support backup administrative tasks like list, delete, and download.

Run the clickhouse backup tool from root user or clickhouse user

GLOBAL OPTIONS:

Default Config Path :

Default config path is defined in the location /var/lib/clickhouse/backup/

ImportantNote :

We shouldn’t change the file permission for the default path /var/lib/clickhouse/backup/. As this path contains the hard links. If we change the permission or ownership of default path on hard link, this will be changed the clickhouse too. This will leads to data corruption.

Config File :

All options can be overwritten via environment variables

Backup the data from tool:

From the backup tool, i have used the option “create” to create a new backup.

By default, while creating the backup from this backup tool, It will create the folder metadata and shadow under the backup directory.

In metadata directory, the metadata file will be present. ( i.e ) it contains the table structure.

In the shadow directory, the data files will be present.

The default dump file is stored in the path -> /var/lib/clickhouse/backup/.

[root@mydbopslabs202 testing]# cat test_table.sql

List the dump file :

We can check the list of backups using the option “list” from the backup tool. It’s shows the dump file with the created date time.

Restoring the dump file using clickhouse-backup tool :

” Restore ” is an option to restore the data from the dump file in the clickHouse server.

While restoring from clickHouse backup tool, first it will restore the metadata ( Structure of the table ) from the dump file which is in the metadata directory. Once the metadata is restored in the table, it will prepare the data by restoring the data files present in the shadow directory. Finally, it will do an ALTER TABLE…ATTACH PART. Simply it will add the data to the table from the detached directory.

Validating the logs from the restored backup tool :

There is best pros in the backup tool in which it differentiate the metadata structure and data files in the separate folder such as metadata dir and shadow directory under the backup directory. As mentioned earlier, The data structure will be available in meta data directory and data files will be available in the shadow directory under the mentioned backup directory.

There are some cons in the backup tool as the backup size of remote storage is maximum upto 5TB. This backup tool support only MergeTree family table engine

These are the simple possible ways to backup and restore the data from clickHouse server, We can choose the backup type based on our requirement. Depending on the size of the data, we need to choose the backup type based on our environment. ClickHouse-Copier is another way to take the backup. In the upcoming day, we will discuss the more about Clickhosue further.

v0.39.0.1 / Operations Guide / Running Metabase on Docker

Metabase provides an official Docker image via Dockerhub that can be used for deployments on any system that is running Docker.

If you’re trying to upgrade your Metabase version on Docker, check out these upgrading instructions.

Launching Metabase on a new container

Here’s a quick one-liner to get you off the ground (please note, we recommend further configuration for production deployments below):

This will launch a Metabase server on port 3000 by default. You can use docker logs -f metabase to follow the rest of the initialization progress. Once the Metabase startup completes you can access the app at localhost:3000

Since Docker containers have their own ports and we just map them to the system ports as needed it’s easy to move Metabase onto a different system port if you wish. For example running Metabase on port 12345:

Mounting a mapped file storage volume

In its default configuration Metabase uses the local filesystem to run an H2 embedded database to store its own application data. The end result is that your Metabase application data will be on disk inside your container and lost if you ever remove the container.

To persist your data outside of the container and make it available for use between container launches we can mount a local file path inside our container.

Now when you launch your container we are telling Metabase to use the database file at ~/metabase-data/metabase.db instead of its default location and we are mounting that folder from our local filesystem into the container.

Getting your config back if you stopped your container

If you have previously run and configured your Metabase using the local Database and then stopped the container, your data will still be there unless you deleted the container with the docker rm command. To recover your previous configuration:

Find the stopped container using the docker ps -a command.It will look something like this:

Once you have identified the stopped container with your configuration in it, save the container ID from the left most column for the next step.

Use docker commit to create a new custom docker image from the stopped container containing your configuration.

Run your new image using docker run to get up and running again.
Hopefully you have your previously configured Metabase Installation back. If it’s not the one you expected try a different stopped container and do these steps again.

Using Postgres as the Metabase application database

If you are ready to completely move off the H2 embedded database for running Metabase and prefer to use Postgres we’ve got that covered too.

In this scenario all you need to do is make sure you launch Metabase with the correct environment variables containing your Postgres database connection details and you’re all set. For example:

Keep in mind that Metabase will be connecting from within your docker container, so make sure that either you’re using a fully qualified hostname or that you’ve set a proper entry in your container’s /etc/hosts file.

Migrating from H2 to Postgres as the Metabase application database

For general information, see instructions for migrating from H2 to MySQL or Postgres.

To migrate an existing Metabase container from an H2 application database to another database container (e.g. Postgres, MySQL), there are a few considerations to keep in mind:

The target database container must be accessible (i.e. on an available network)
The target database container must be supported (e.g. MySQL, Postgres)
The existing H2 database should be mapped outside the running container

The migration process involves 2 main steps:

Stop the existing Metabase container
Run a new, temporary Metabase container to perform the migration

Using a Postgres container as the target, here’s an example invocation:

To further explain the example: in addition to specifying the target database connection details, set the MB_DB_FILE environment variable for the source H2 database location, and pass the argument load-from-h2 to begin migrating.

Setting the Java Timezone

It’s best to set your Java timezone to match the timezone you’d like all your reports to come in. You can do this by simply specifying the JAVA_TIMEZONE environment variable which is picked up by the Metabase launch script. For example:

Additional custom settings

While running Metabase on docker you can use any of the custom settings from Customizing the Metabase Jetty Webserver by setting environment variables on your docker run command.

In addition to the standard custom settings there are two docker specific environment variables MUID and MGID which are used to set the user and group IDs used by metabase when running in a docker container. These settings make it possible to match file permissions when files, such as the application database, are shared between the host and the container.

Here’s how to use a database file, owned by your account, that is stored in your home directory:

Now that you’ve installed Metabase, it’s time to set it up and connect it to your database.

Copying the application database

If you forgot to configure to the application database, it will be located at /metabase.db/metabase.db.mv.db in the container. You can copy this whole directory out of the container using the following command (replacing CONTAINER_ID with the actual container ID or name, metabase if you named the container):

The DB contents will be left in a directory named metabase.db.Note that some older versions of metabase stored their db in a different default location.

Fixing OutOfMemoryErrors in some hosted environments

On some hosts Metabase can fail to start with an error message like:

If that happens, you’ll need to set a JVM option to manually configure the maximum amount of memory the JVM uses for the heap. Referto these instructions for details on how to do that.

Adding external dependencies or plugins

To add external dependency JAR files such as the Oracle or Vertica JDBC drivers or 3rd-party Metabase drivers, you will need to create a plugins directory in your host system and bind it so it is available to Metabase as the path /plugins using either --mount or -v/--volume. For example, if you have a directory named /path/to/plugins on your host system, you can make its contents available to Metabase using the --mount option as follows:

Note that Metabase will use this directory to extract plugins bundled with the default Metabase distribution (such as drivers for various databases such as SQLite), thus it must be readable and writable by Docker.

Clickhouse Database

Use Docker Secrets to hide the sensitive parameters

In order to keep your connection parameters hidden from plain sight, you can use Docker Secrets to put all parameters in files so Docker can read and load them in memory before the container is started.

This is an example of a docker-compose.yml file to start a Metabase container with secrets to connect to a PostgreSQL database. Create 2 files (db_user.txt and db_password.txt) in the same directory as this yml and fill them with any username and a secure password: