Profile cover photo
Profile photo
Jim Ferris
48 followers
48 followers
About
Posts

Post has attachment
Working with Confluent's Schema Registry and Kerberos
      Schemas , metadata for your data, is still very valuable regardless of all of the noise around schema-less data storage. But as the old adage goes, "if all you have is a hammer, everything looks like a nail".  As is turns out, schemas are very valuabl...
Add a comment...

Post has attachment
Using the XML SerDe in Hive for Exploding Nested XML Elements
     This article will give a detailed account of how to install and use the XML Serde , by  Dmitry Vasilenko ,  in Hive located in github here . Installing:      Detailed instructions for the installation can be found here , but I'm going to take you throu...
Add a comment...

Post has attachment
Using the XML SerDe in Hive for Exploding Nested XML Elements
     This article will give a detailed account of how to install and use the XML Serde , by  Dmitry Vasilenko ,  in Hive located in github here . Installing:      Detailed instructions for the installation can be found here , but I'm going to take you throu...
Add a comment...

Post has attachment
Configuring Streamsets Data Collector With Hashicorp Vault Using AppRole
      In my previous post  I detailed how to install and configure Hashicorp Vault using the AppID  auth backend to work with Streamsets Data Collector . Now that the AppID auth backend has been deprecated, the AppRole auth backend is the Vault backend of c...
Add a comment...

Post has attachment
Installing CouchDB on Red Hat Enterprise Linux
     I got this up and running on RHEL 7.3 . It took awhile to get through all the dependencies, Unlike what it says on the CouchDB Installation page, there is no RPM   for this. Your'e going to have to download, compile, punch things to get this installed....
Add a comment...

Post has attachment
Using SQL Server Change Tracking with Streamset's Data Collector Kudu Destination
      Just started using Apache Kudu  for a few projects that require near-real time data, as well as for our ETL logging for analytics. It's a really great storage engine if you like working with relational data, and if you can get around the limitations ....
Add a comment...

Post has attachment
Recovering From an Avro to Parquet Conversion Failure in Streamsets Data Collector
     In version 2.6   of Streamsets Data Collector, they added data drift support for the parquet file format. This means that if your source system changes, i.e. a new column has been added to the table/flat file/etc., Data Collector will update the target...
Add a comment...

Post has attachment
Creating a Sqoop Metastore in Postgres
     If you are using sqoop  to load data from your source system relational databases into Hadoop, you probably want a way to track the deltas . This way you won't have to do a full load every time or build some delta mechanism on your own. The sqoop metas...
Add a comment...

Post has attachment
Configuring Streamsets Data Collector with Hashicorp Vault
     I've been using Streamsets Data Collector a lot lately in my work, and I'm really impressed with it. It has a really nice UI and lots of components that come out of the box with the product. By virtue of the name, this product was built for streaming d...
Add a comment...

Post has attachment
This dude is like a distributed database sherlock. Love his post on how he destroys MongoDB...
Add a comment...
Wait while more posts are being loaded