Widely used data mining systems are expensive in...
...time (configuration, tuning, indexing, query run times)
...money (licensing, hardware, services)
...flexibility (must discard or index data to make queries tractable)
...or in some cases all of the above.
r17 is our response to these problems.
r17 provides brute-force performance so you're free to explore the data without a separate indexing step and without reducing data resolution.
r17 provides pipeline concurrency and two kinds of cross-machine parallel concurrency.
Mix data from multiple sources without a separate import step. External applications can participate in queries using an easily-parsed & efficient data format. Store data in r17's efficient binary format...or not, as you please.
r17 allows SELECT, WHERE and JOIN queries of data streams in real time.
r17 is a single executable. No wrestling with complex & fragile installation or configuration. It's easy to learn and uses familiar idioms.
r17's syntax is a cross between UNIX shell and SQL.
ls | grep "fred"
is roughly equivalent to
io.directory.list(".") | rel.where(file_name = "fred");
or (since 1.4.2) io.ls(".") | rel.where(file_name = "fred");
The |
has the same meaning as in UNIX shell: the io.directory.list stream operator will execute concurrently with the rel.where operator.
SELECT username, COUNT(1) AS num FROM users GROUP BY username ORDER BY num;
is roughly equivalent to
io.file.read('users') | rel.select(username) | rel.group(count) | rel.order_by(_count);
The most interesting difference is that each r17 clause will execute concurrently.
And now for something that's quite difficult to do in UNIX shell or SQL alone:
sudo tail -f /var/log/apache2/access.log
| r17 'rel.from_text(
"^([^ ]+?) [^ ]+? [^ ]+? \\[([^\\]]+?)\\]",
"string:ip_address", "string:date")
| rel.join.natural("interesting_ips.r17_native")
| rel.to_tsv();'
This parses the IP address and date from the Apache access log, joins to an interesting list of IP addresses and converts the result to TAB-separated-value format, all in real time.
rel.select(
(if (str.starts_with(name, "Johann") || (j_i > 0)) then (
"Another Johann"
) else (
"No Johann"
)) as johann_nature);
Supported on 32- and 64-bit Linux and Mac OS X. Other UNIX-like platforms available on request. r17 is a single 1MB-ish executable dependent only on the lowest-level OS-supplied libraries.
Thanks to Dave Gamache for the Skeleton template.