Flume

2 / 6
Flume - Agents
   

[Flume - Agents]

Flume agents are independent daemon processes. Flume agent consists of three parts: Source, Channel and Sink.

  • A source receives data from the data generators

Examples - Tweets, Web server logs, Click event data in any application, etc

  • A channel is a transient store which receives the events from the source and buffers them till they are consumed by the sinks. It acts as a bridge between the sources and the sinks.

Examples - File System channel, memory channel, etc

  • A sink consumes data from the channels and delivers it to the destination.

Examples - HDFS, HBase etc

When the rate of incoming data from the source exceeds the rate at which it can be written to the destination, flume channel acts as a mediator between the source and sink by buffering the data.

[Flume - Use Case - Agents]

Flume agents run on every machine where we want to collect the data.

As displayed in the image, in our previous use case, flume agents will be installed on every web server where data is being produced. A data collector collects data from agents and pushes it to a centralized data store.

[Slide Flume - Multiple Agents]

Flume agents can be arranged in arbitrary topologies. As shown in the image, the source is consuming data from the sink and the same sink data is getting consumed by multiple sources.