Apache NiFi – The Complete Guide (Part 1)


Apache NiFi – The Complete Guide (Part 1)

2 Mar , 2019  


What is Apache NiFI?

Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.

NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.

How NiFi Works?

NiFi consists of atomic elements which can be combined into groups to build simple or complex dataflow.

NiFi as Processors & Process Groups.

What is a Processor?

A Processor is an atomic element in NiFi which can do some specific task.

The latest version of NiFi have around 280+ processors, and each has its responsibility.

Ex. The GetFile processor can read a file from a specific location, whereas PutFile processor can write a file to a particular location. Like this, we have many other processors, each with its unique aspect.

We have processors to Get Data from various data sources and processors to Write Data to various data sources.

The data source can be almost anything.

It can be any SQL database server like Postgres, or Oracle, or MySQL, or it can be NoSQL databases like MongoDB, or Couchbase, it can also be your search engines like Solr or Elastic Search, or it can be your cache servers like Redis or HBase. It can even connect to Kafka  Messaging Queue.

NiFi also has a rich set of processors to connect with Amazon AWS entities likes S3 Buckets and DynamoDB.

NiFi have a processor for almost everything you need when you typically work with data. We will go deep into various types of processors available in NiFi in later videos. Even if you don’t find a right processor which fit your requirement, NiFi gives a simple way to write your custom processors.

Now let’s move on to the next term, FlowFile.

What is a FlowFile?

The actual data in NiFi propagates in the form of a FlowFile. The FlowFile can contain any data, say CSV, JSON, XML, Plaintext, and it can even be SQL Queries or Binary data.

The FlowFile abstraction is the reason, NiFi can propagate any data from any source to any destination. A processor can process a FlowFile to generate new FlowFile.

The next important term is Connections.

In NiFi all processors can be connected to create a data flow. This link between processors is called Connections. Each connection between processors can act as a queue for Flow Files as well.

The next one is Process Group and Input or Output port.

In NiFi, one or more processors are connected and combined into a Process Group. When you have a complex dataflow, it’s better to combine processors into logical process groups. This helps in better maintenance of the flows.

Process Groups can have input and output ports which are used to move data between them.

The last and final term you should know for now is the Controller Services.

Controller Services are shared services that can be used by Processors. For example, a processor which gets and puts data to a SQL database can have a Controller Service with the required DB connection details.

Controller Service is not limited to DB connections.

#learnwithmanoj #apachenifi #nifi #dataflow #datapipeline #etl #opensource #bigdata #opensource #datastreaming #hortonworks #hdf #nifitutorial #nifitraining

Who this course is for:
  • Software Engineers
  • Data Engineers
  • Software Architects
  • Data Scientists