Recent Posts

Long Running Jobs in YARN distcp.

1 minute read

Usually distcp and other batch jobs run for a very long time. This is fine if you are running a HADOOP environment without kerberos. When we kerberize a clus...

Checking HDFS health using fsck.

3 minute read

When we have large data sets on the cluster, there will be corruptions of blocks. This could be due to disk or any other.

MongoDB to Neo4j Using neo4j_doc_manager.

10 minute read

We had a requirement where we wanted to have all the data which is in mongodb to be replicated on neo4j to show few graphs. Here is quick way to demonstrate ...

Cropping Bulk Images Using Python.

3 minute read

I was working on getting post headers for my post on this blog. I had couple of images from unsplash. But the header for the post need to be a little more ho...