In fall last year we introduced Custom Science applications at Keboola Meetup, it was exactly one day old at that time. From that time we worked in a lot of improvements (support for private repositories, encryption, error handling and documentation) and many bug fixes. Now we also improved the UI, so this post is a summary of what is currently possible.
Custom Science is the easiest way ever to integrate arbitrary code (in R and Python and now also in PHP) into your KBC projects. To create a CS application, you just need to provide a Git repository, which contains the code to transform your data, we will take care of the rest.
Sounds similar to transformations? Good, because it really is, but with some awesome twists. With custom science:
- you can quickly integrate a 3rd party application or API
- the developer of the application does not need to have access to your project
- you do not need to have access to the application code
- you have much more freedom in the code organization
What is it good for?
The point of all this is that you can hire a 3rd party developer, point him to our documentation and let him do his work. When he is done, you can integrate his application with a few clicks. We see this as an ultimate connection between You and hardcore Ph.D. data scientists who have algorithms you're ailing for (but they do not have the infrastructure to run them in a usable form). We care about protecting you, because those developers do not need access to your project. If you are really concerned, than you can also isolate the application completely from any network. We also care about protecting the developers, because if they share a git repository with you, they can use a private repository, without sharing the password with you. We don't really want to interrupt anyones business and we try to stay out of the way as much as possible, so what the application does is completely upon the agreement between you and 3rd party.
You might also consider using custom applications for your own transformations code. We are long aware that some of your transformations are really complicated. You can now take a complex transformation, split it into several files, classes (or whatever you want), and you can run tests on it. Again, your code can be stored in private git repositories, and be protected if you want to. Also, this way, you can share transformation code between multiple KBC projects.
Differences to transformations
- To create transformation code, you need access to KBC project, to create Custom Science code, you don't need to.
- Transformation code is accessible to anyone in project, Custom Science code can be hidden.
- Transformation code must be stacked into a single script, Custom Science code can be organized freely.
- Transformations are tied to project, Custom Science code is in separate repository and can be shared between projects.
- Transformations are versioned as changes in the configuration in the KBC project. Custom Science is versioned using tags in a git repository.
- Transformations should have input and output, Custom Science does not need to, so it can take role of extractors or writers.
- Transformations have no parameters, Custom Science can be parametrized.
Q & A
Why is it called Custom Science?
Because it is designed to connect a 3rd party (data) scientist with you, so that he can provide you with Science customized to your needs.
Will I have to rewrite all Transformations to Custom Science?
Certainly not. Transformations are there to stay. Custom Science is another option, not replacement.
Will Custom Science be limited to R and Python and PHP?
It depends on demand. If you require another language, let us know. So far we got request for R, Python 3.x and Python 2.x and PHP so we have those.
What are the differences in the code between Transformations and Custom Science?
Almost none, there are minor differences in handling packages (they are installed automatically in R/Python applications and have to be installed manually in CS) and handling file inputs.
I made a Custom Science Application, can I share it with other people?
- Step by step tutorial
- Custom Science in R - examples, testing, integration with Travis CI
- Custom Science in Python - examples, testing, integration with Travis CI
- Custom Science in PHP - basic examples