We were investigating a weird Spark exception recently. This happened on Apache Spark jobs that were running fine until now. The only difference we saw was an upgrade from IBM BigReplicate 4.1.1 to 4.1.2 (based on WANdisco Fusion 2.11 I believe).
This is the stack trace we saw:
Caused by: java.lang.IllegalArgumentException: interface com.wandisco.fs.common.ApiCompatibility is not visible from class loader at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:581) at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557) at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230) at java.lang.reflect.WeakCache.get(WeakCache.java:127) at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419) at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719) at com.wandisco.fs.common.CompatibilityAdaptor.createAdaptor(CompatibilityAdaptor.java:26) at com.wandisco.fs.client.FusionCommon.<init>(FusionCommon.java:79) at com.wandisco.fs.client.ReplicatedFC.<init>(ReplicatedFC.java:131) at com.wandisco.fs.client.ReplicatedFC.<init>(ReplicatedFC.java:125) at com.wandisco.fs.client.ReplicatedFC.get(ReplicatedFC.java:1078) at com.wandisco.fs.client.FusionCommonFactory.newInstance(FusionCommonFactory.java:99) at com.wandisco.fs.client.FusionCommonFactory.get(FusionCommonFactory.java:59) ... 48 more
It took us a while to investigate and we learned a couple of things in the course. Namely:
-verbose:classthat'll show you all the classes that the JVM loads and from which file it takes them
jdbwhich is fabulous when you have no way to connect with an IDE for remote debugging (firewall etc.)
There are multiple different transports to communicate between the debugger and a running virtual machine.
Don't do what we did and blindly follow the first link Google gives you as that was (in our case) the documentation for
jdb on Windows.
Windows has a transport called
dt_shmem (Shared Memory) which is not available on Linux. On Linux you want to use
Back to the problem at hand: We developed a minimal example for our problem. Starting the Spark Shell worked just fine. As soon as we ran a command that "triggered" the Hive integration we'd get the above stack trace:
First we looked at the OpenJDK source code as to what would trigger the error message
is not visible from class loader. This is triggered in the
Proxy class and happens when it queries a given Classloader for a Class and compares it with a reference Class instance. As soon as those two are different (they will point to the same logical class but not the same instance, i.e.
false) the previous exeception is thrown. There is also this SO question which hits on the same issue and helped a little in understanding this.
We (finally) found the issue by attaching
jdb to a running Spark and breaking at the Proxy creation.
There we saw that the two class instances were indeed different in that they are coming from two different Classloaders. One is from the default SystemClassLoader, the other from an
IsolatedClientLoader from the Spark project.
Once we found this we were quick to find the solution to our problem. Here are the relevant JIRAs:
The last one held the key. It introduces a parameter called
spark.sql.hive.metastore.sharedPrefixes (Spark documentation in the section Interacting with Different Versions of Hive Metastore) that allows specifying certain prefixes of classes that are shared between the main system classloader and the isolated classloader for Hive. We simply had to add
com.wandisco here and everything worked again:
spark-shell --conf spark.sql.hive.metastore.sharedPrefixes=com.wandisco.
As of Spark 2.3 there are a few hardcoded default prefixes that are applied in addition to the ones you specify:
While this is a very obscure error that probably won't affect many people, if you are stuck with it and find this blog post I hope it saves you a few hours of debugging!