Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

datafari

Compare

  Analyzed 2 days ago

Datafari is an open source enterprise search solution. It provides the following functionnalities: big data scalability, data and search analytics, semantics, and security. It comes with a responsive design UI and graphical administration.

406K lines of code

6 current contributors

3 days since last commit

2 users on Open Hub

Moderate Activity
0.0
 
I Use This

Crossdata

Compare

  Analyzed 3 days ago

Easy access to big things. Library for Apache Spark extending and improving its capabilities

30.7K lines of code

2 current contributors

over 4 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

vespa

Compare

  Analyzed 1 day ago

Vespa is an engine for low-latency computation over large data sets.

1.61M lines of code

42 current contributors

2 days since last commit

2 users on Open Hub

Very High Activity
0.0
 
I Use This
Licenses: No declared licenses

Disco

Compare

  Analyzed 3 days ago

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers. The Disco core is written in Erlang, a functional language that is designed for ... [More] building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data. [Less]

29.8K lines of code

0 current contributors

over 7 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Apache Falcon

Compare

Claimed by Apache Software Foundation Analyzed about 14 hours ago

Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters. ... [More] Data Management on Hadoop encompasses data motion, process orchestration, lifecycle management, data discovery, etc. among other concerns. Falcon will enable easy data management via declarative mechanism for Hadoop. Users of Falcon platform simply define infrastructure endpoints, data sets and processing rules declaratively. This information about inter-dependencies between various entities allows Falcon to orchestrate and manage various data management functions. [Less]

169K lines of code

0 current contributors

almost 6 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

Apache Whirr

Compare

Claimed by Apache Software Foundation Analyzed 5 days ago

Apache Whirr is a set of libraries for running cloud services. Whirr provides: * A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider. * A common service API. The details of provisioning are particular to the service. * Smart defaults for ... [More] services. You can get a properly configured system running quickly, while still being able to override settings as needed. You can also use Whirr as a command line tool for deploying clusters. [Less]

26.9K lines of code

0 current contributors

almost 9 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Universal Java Matrix Package (UJMP)

Compare

  Analyzed 2 days ago

The Universal Java Matrix Package (UJMP) is an open source Java library that provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra like matrix multiplication or matrix inverse. Operations like mean, correlation, standard deviation, replacement of ... [More] missing values or mutual information are also supported. Matrices can be imported from and exported to a large number of file formats, also linking to JDBC databases is supported. The Universal Java Matrix Package supports multidimensional matrices as well as generic matrices with a specified object type and can also handle very large matrices even when they do not fit into memory. [Less]

84.9K lines of code

0 current contributors

almost 9 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

OpenVDS

Compare

  Analyzed 2 days ago

OpenVDS is an open-source reference implementation of a storage format for fast random access to multi-dimensional (up to 6D) volumetric data stored in an object storage cloud service (e.g. Amazon S3, Azure Blob storage or Google Cloud Storage). OpenVDS may be used to store E&P data types ... [More] such as regularized single-Z horizons/height-maps (2D), seismic lines (2D), pre-stack volumes (3D-5D), post-stack volumes (3D), geobody volumes (3D-5D), and attribute volumes of any dimensionality up to 6D. The format has been designed primarily to support random access and on-demand fetching of data, this enables applications that are responsive and interactive as well as efficient I/O for high-performance computing or machine learning workloads. OpenVDS is based on Bluware's VDS format. [Less]

101K lines of code

0 current contributors

17 days since last commit

2 users on Open Hub

Moderate Activity
0.0
 
I Use This

dpark

Compare

  Analyzed 2 days ago

Python clone of Spark, a MapReduce alike framework in Python

20K lines of code

2 current contributors

over 3 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This
Licenses: No declared licenses

talend-bridge-api

Compare

Claimed by The OW2 Consortium Analyzed 4 days ago

API to help developers to build Talend Open Studio components in a OOP way. This API provide accessors, data structures, lists, an ORM layer, Talend type checking and connectors to help build and debug TOS components minimizing the amount of lines of code that need to be written in JET template files (pain!)

1.22K lines of code

0 current contributors

about 10 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This
Licenses: No declared licenses