Recent Posts

Checking HDFS health using fsck.

3 minute read

When we have large data sets on the cluster, there will be corruptions of blocks. This could be due to disk or any other.

MongoDB to Neo4j Using neo4j_doc_manager.

10 minute read

We had a requirement where we wanted to have all the data which is in mongodb to be replicated on neo4j to show few graphs. Here is quick way to demonstrate ...

Cropping Bulk Images Using Python.

3 minute read

I was working on getting post headers for my post on this blog. I had couple of images from unsplash. But the header for the post need to be a little more ho...

Kafka Kerberos Enable and Testing.

19 minute read

Apache Kafka is a distributed streaming platform. Kafka 2.0 supports Kerberos authentication, Enabling Kerberos Authentication Using the Wizard on cloudera m...

Parcel Not Distributing Cloudera CDH.

1 minute read

We were deploying one of the cluster on our lab environment which is used by everyone. So the lab has it own share of stale information on it.