-
under Apache License 2.0 license
-
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
-
under Apache License 2.0 license
-
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
-
under Apache License 2.0 license
-
A simple and easy to use Web Report System for java.EasyReport是一个简单易用的Web报表工具(支持Hadoop,HBase及各种关系型数据库),它的主要功能是把SQL语句查询出的行列结构转换成HTML表格(Table),并支持表格的跨行(RowSpan)与跨列(ColSpan)。同时它还支持报表Excel导出、图表显示及固定表头与左边列的功能。
-
under Apache License 2.0 license
-
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
-
under Apache License 2.0 license
-
AI on Hadoop
-
under Apache License 2.0 license
-
The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.
-
under Apache License 2.0 license
-
Learning to write Hadoop examples
-
under Apache License 2.0 license
-
A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
-
under Apache License 2.0 license
-
Hadoop Configurations
-
under Apache License 2.0 license
-
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
-
under Apache License 2.0 license
-
The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
-
under Apache License 2.0 license
-
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
-
under Apache License 2.0 license
-
Scalable, redundant, and distributed object store for Apache Hadoop
-
under Apache License 2.0 license
-
Code repository for O'Reilly Hadoop Application Architectures book
-
under Apache License 2.0 license
-
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
-
under Apache License 2.0 license
-
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
-
under Apache License 2.0 license
-
Extensible set of Storm topologies and topology attributes for streaming, enriching, indexing, and storing telemetry in Hadoop.
-
under Apache License 2.0 license
-
:elephant: Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
-
under Apache License 2.0 license
-
'Hadoop illuminated' hadoop book
-
under Apache License 2.0 license
-
Source code that accompanies the book "Hadoop in Practice, Second Edition".
-
under Apache License 2.0 license
-
This project realizes playing videos storing in HDFS(Hadoop) in the web page online.在线播放HDFS中视频文件
-
under MIT License license
-
Customer Product search clicks analytics using big data Hadoop, Hive, Oozie, ElasticSearch, Akka, Spring Data
-
under Apache License 2.0 license
-
Fast and efficient batch computation engine for complex analysis and reporting of massive datasets on Hadoop
-
under Apache License 2.0 license
-
MrGeo is a geospatial toolkit designed to provide raster-based geospatial capabilities that can be performed at scale. MrGeo is built upon Apache Spark and the Hadoop ecosystem to leverage the storage and processing of hundreds of commodity computers. See the wiki for more details.
-
under GNU General Public License v2.0 license
-
基于hadoop思维的分布式网络爬虫。
-
under Apache License 2.0 license
-
Hadoop Map-Reduce Design Patterns
-
under Apache License 2.0 license
-
SequenceIQ Hadoop examples
-
under MIT License license
-
Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework
-
under Apache License 2.0 license
-
Hadoop output committers for S3
-
under MIT License license
-
CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
-
under Apache License 2.0 license
-
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
-
under GNU Lesser General Public License v2.1 license
-
cephfs-hadoop
-
under Apache License 2.0 license
-
Code for Tutorial on designing clickstream analytics application using Hadoop
-
under Apache License 2.0 license
-
Hadoop utilities for Kafka, S3, and more
-
under Apache License 2.0 license
-
-
under Apache License 2.0 license
-
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
-
under Apache License 2.0 license
-
Code to index HDFS to Solr using MapReduce
-
under Apache License 2.0 license
-
A Java NIO file system provider for HDFS
-
under Apache License 2.0 license
-
Ephemeral Hadoop clusters using Google Compute Platform
-
under Apache License 2.0 license
-
This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
-
under Apache License 2.0 license
-
Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop
-
under Apache License 2.0 license
-
Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
-
under Apache License 2.0 license
-
Fitting是一个面向大数据的统一的开发框架,由大快搜索主导并完全开源,克服了大数据技术开发涉及技术面广,各组件间缺乏统一规范等问题,能有效降低大数据的学习难度,并提高大数据项目的开发效率并可与开源项目混用。 Fitting遵循Apache2.0开源协议,采用类黑箱框架模式,将大数据生态圈内各组件底层API根据应用组合封装为Fitting API服务。用户编程时直接引用Fitting框架,即可使用功能丰富的Fitting API,完成过去复杂的编码工作。 Fitting框架由数据处理(dataprocess)、数据源(datasource)、ElasticSQL引擎(elasticsql)、图计算(graphx)、机器学习(ml)、自然语言处理(nlp)、搜索(search)、SQL工具类、(sqlutils)、流计算(stream)九大部分组成,可以单独部署,也可整体部署。 Fitting支持C、C++、C#、Cocoa、Common Lisp、Dlang、Dart、Delphi、Erlang、Go、Haskell、Haxe、Java (SE)、Java (ME)、Lua、node.js、OCaml、Perl、PHP、Python、Ruby、Rust、Smalltalk等二十多种编程语言。
-
under MIT License license
-
Hadoop,MapReduce编程学习练手实例
-
under Apache License 2.0 license
-
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
-
under Apache License 2.0 license
-
Hadoop mapreduce job to bulk load data into Cassandra
-
under Apache License 2.0 license
-
Exports Hadoop HDFS content statistics to Prometheus
-
under Apache License 2.0 license
-
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
-
under Apache License 2.0 license
-
Yarn example source code accompanying wikibooks "Beginning Hadoop Programming" by Jaehwa Jung
-
under Apache License 2.0 license
-
study demos for hadoop、hbase、hive、spark、storm .......
-
under Apache License 2.0 license
-
A java implementation of Raft protocol for Hadoop ecosystem
-
under Apache License 2.0 license
-
Aerospike Hadoop Connector
-
under Apache License 2.0 license
-
MarkLogic Connector for Hadoop and MarkLogic Contentpump (mlcp)
-
under Apache License 2.0 license
-
Cryptographic library optimized with AES-NI
-
under Apache License 2.0 license
-
Examples and Slides for "Introduction to Spring for Apache Hadoop" at SpringOne2GX 2014
-
under Apache License 2.0 license
-
Java event logs collector for hadoop and frameworks
-
under Apache License 2.0 license
-
A Hadoop input format and a Hive storage handler so that you can access data stored in Windows Azure Storage tables from within a Hadoop (or HdInsight) cluster.
-
under Apache License 2.0 license
-
Titan 1.0 with TP3.1 (Master Branch), HBase 1.1.1 and Hadoop 2.7.1 support
-
under Apache License 2.0 license
-
-
under Apache License 2.0 license
-
Apache Hive / Hadoop Spring Boot Microservice
-
under Apache License 2.0 license
-
MongoDB-Hadoop Workshop Exercises
-
under Apache License 2.0 license
-
Zephyr is a big data, platform agnostic ETL API, with Hadoop MapReduce, Storm, and other big data bindings.
-
under Apache License 2.0 license
-
HadoopCV Hadoop,Spark Reader Video!
-
under GNU Affero General Public License v3.0 license
-
HA Distributed System to transactionally move data from Sources (eg Kafka) to Sinks ( eg Hadoop HDFS )
-
under MIT License license
-
Example program that writes Parquet formatted data to plain files (i.e., not Hadoop hdfs); Parquet is a columnar storage format.
-
under MIT License license
-
WARC (Web Archive) Input and Output Formats for Hadoop
-
under Apache License 2.0 license
-
Splittable Input Format for Reading Cassandra SSTables Directly
-
under Apache License 2.0 license
-
Map Reduce Implementation of a community detection algorithm extending Louvain method for community detection.
-
under MIT License license
-
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.
-
under MIT License license
-
The most simple way to test Kafka based applications or micro-services e.g. Read/Write during HBase/Hadoop or other Data Ingestion Pipe Lines
-
under Apache License 2.0 license
-
Hadoop MapReduce over Hive based implementation of attributed network pattern matching.
-
under MIT License license
-
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
-
under Apache License 2.0 license
-
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/ML etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
-
under Apache License 2.0 license
-
Scriptable scheduler for periodical Hadoop workflows
-
under Apache License 2.0 license
-
Public hadoop release repository
-
under Apache License 2.0 license
-
Set of hadoop input/output formats for use in combination with hadoop streaming