背景和问题表现
公司项目在数据预处理时使用 seatunnel,在 init 容器中导入和导出数据。在 A 机器时,seatunnel 正常启动,init 容器执行成功,在 B 机器上失败。
经常出现:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.util.StringInterner
偶尔出现:Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.com.google.common.collect.Interners
排查
先从第一个问题入手:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.util.StringInterner。
看下报错类在哪个 jar:
grep -rcl 'org.apache.hadoop.util.StringInterner'
seatunnel/lib/hadoop-common-3.3.6.jar
seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar
用反射在包装之后的 seatunnel 容器里尝试加载类:
vi TestStringInterner.java
public class TestStringInterner {
public static void main(String[] args) {
try {
// 尝试加载 StringInterner 类
Class.forName("org.apache.hadoop.util.StringInterner");
System.out.println("StringInterner loaded successfully");
} catch (ClassNotFoundException e) {
System.out.println("Failed to load StringInterner: " + e.getMessage());
}
}
}
# javac 编译,后续执行
javac TestStringInterner.java
首先在 A 上测试:
# 尝试加载所有的依赖 jar ==== 成功
java -verbose:class \
-XX:+TraceClassLoading \
-cp ".:lib/*" \
TestStringInterner
# 使用上面搜到的两个 jar 加载 ==== 失败
java -verbose:class \
-XX:+TraceClassLoading \
-cp ".:lib/hadoop-common-3.3.6.jar:lib/seatunnel-hadoop3-3.1.4-uber.jar" \
TestStringInterner
失败并报错和之前遇到的偶发报错相同:
问题:Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.com.google.common.collect.Interners随即看下 Github 里 hadoop 代码:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java 用到了 Interners 且是个静态方法。
# 尝试换个位置加载 ==== 成功
java -verbose:class \
-XX:+TraceClassLoading \
-cp ".:lib/seatunnel-hadoop3-3.1.4-uber.jar:lib/hadoop-common-3.3.6.jar" \
TestStringInterner
# 为什么换个位置就执行成功?seatunnel-hadoop3-3.1.4-uber 有 guava,hadoop-common 没有?
bash-4.4# jar tvf lib/seatunnel-hadoop3-3.1.4-uber.jar | grep -i guava
0 Thu Oct 18 14:17:14 CST 2018 META-INF/maven/com.google.guava/
0 Thu Oct 18 14:17:14 CST 2018 META-INF/maven/com.google.guava/guava/
133 Thu Oct 18 14:17:14 CST 2018 META-INF/maven/com.google.guava/guava/pom.properties
8118 Thu Oct 18 14:07:06 CST 2018 META-INF/maven/com.google.guava/guava/pom.xml
0 Tue Jan 01 03:00:00 CST 1980 META-INF/maven/com.google.guava/failureaccess/
1671 Tue Jan 01 03:00:00 CST 1980 META-INF/maven/com.google.guava/failureaccess/pom.xml
93 Tue Sep 11 15:39:26 CST 2018 META-INF/maven/com.google.guava/failureaccess/pom.properties
0 Tue Jan 01 03:00:00 CST 1980 META-INF/maven/com.google.guava/listenablefuture/
2278 Tue Jan 01 03:00:00 CST 1980 META-INF/maven/com.google.guava/listenablefuture/pom.xml
134 Tue Sep 11 15:40:36 CST 2018 META-INF/maven/com.google.guava/listenablefuture/pom.properties
bash-4.4# jar tvf lib/hadoop-common-3.3.6.jar | grep -i guava
bash-4.4#
果然是,seatunnel-hadoop3-3.1.4-uber 有 guava,而 hadoop-common 里的 static 方法在类加载时就开始加载了。
当先有 hadoop-common 加载时就会出现类找不到异常,进而导致 StringInterners 类初始化失败。
怀疑是类加载机制问题:
seatunnel 启动命令:使用 cp 加载依赖 jar
`java -Dcfg=cfg -cp /path/*`, 一阵 google + 大模型,发现并不是,和上面不同顺序的 jar 加载问题对比,
发现 * 里的文件顺序是最终问题原因:
随即看下是不是文件系统对 * 通配里的文件顺序问题?seatunnel-hadoop3-3.1.4-uber/hadoop-common顺序反了?
bash-4.4# df -T .
Filesystem Type 1K-blocks Used Available Use% Mounted on
overlay overlay 2256141992 261706012 1994435980 12% /
# 因为在 docker 容器中,df -T 的结果都是 overlay,需要退出查看宿主机文件系统:
# A
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p1 ext4 1967874324 1702433868 165404192 92% /home
# B
(base) [root@g1055 ~]# df -T .
文件系统 类型 1K-块 已用 可用 已用% 挂载点
/dev/mapper/klas-root xfs 2256141992 237719688 2018422304 11% /
google 得到的结果是:ext4 在排序时使用 inode id 排序,而 xfs 默认就是用 inode id 来排序,在文件夹有很大内容时,排序策略会有所调整。
ls -f 不对结果排序,只显示一些简单信息 -i 显示文件系统的 inode id
# A
bash-4.4# ls -fi
88773982 . 88773986 postgresql-42.7.8.jar 88773984 hadoop-common-3.3.6.jar
88773983 DmJdbcDriver8.jar 88773985 mysql-connector-j-8.0.33.jar 88773930 ..
88773989 seatunnel-transforms-v2.jar 88773987 seatunnel-hadoop3-3.1.4-uber.jar 88773988 seatunnel-hadoop-aws.jar
# 完全乱序,没有规则,使用 -l 排序看看
bash-4.4# ls -li
total 201196
88773983 -rw-rw-r-- 1 1000 1000 1653596 12月 10 10:43 DmJdbcDriver8.jar
88773984 -rw-r--r-- 1 1000 1000 4603101 12月 9 17:51 hadoop-common-3.3.6.jar
88773985 -rw-rw-r-- 1 1000 1000 2481560 12月 9 17:51 mysql-connector-j-8.0.33.jar
88773986 -rw-r--r-- 1 1000 1000 1116727 12月 9 17:51 postgresql-42.7.8.jar
88773987 -rw-r--r-- 1 1000 1000 43209805 12月 9 17:51 seatunnel-hadoop3-3.1.4-uber.jar
88773988 -rw-r--r-- 1 1000 1000 86818044 12月 9 17:51 seatunnel-hadoop-aws.jar
88773989 -rw-r--r-- 1 1000 1000 66131549 12月 9 17:51 seatunnel-transforms-v2.jar
# inode 排序
# B xfs
bash-4.4# ls -fi
30526065 . 30526067 hadoop-common-3.3.6.jar 30526070 seatunnel-hadoop-aws.jar
30526044 .. 30526068 mysql-connector-j-8.0.33.jar 30526071 seatunnel-hadoop3-3.1.4-uber.jar
30526066 DmJdbcDriver8.jar 30526069 postgresql-42.7.8.jar 30526072 seatunnel-transforms-v2.jar
# xfs 即使在用了 -f 参数也是有序的。
bash-4.4# ls -li
total 201196
30526066 -rw-rw-r-- 1 1000 1000 1653596 12月 10 10:43 DmJdbcDriver8.jar
30526067 -rw-r--r-- 1 1000 1000 4603101 12月 9 17:51 hadoop-common-3.3.6.jar
30526068 -rw-rw-r-- 1 1000 1000 2481560 12月 9 17:51 mysql-connector-j-8.0.33.jar
30526069 -rw-r--r-- 1 1000 1000 1116727 12月 9 17:51 postgresql-42.7.8.jar
30526071 -rw-r--r-- 1 1000 1000 43209805 12月 9 17:51 seatunnel-hadoop3-3.1.4-uber.jar
30526070 -rw-r--r-- 1 1000 1000 86818044 12月 9 17:51 seatunnel-hadoop-aws.jar
30526072 -rw-r--r-- 1 1000 1000 66131549 12月 9 17:51 seatunnel-transforms-v2.jar
到此就能解释通了,xfs 文件系统永远是 hadoop-common 在前加载,没有 guava 依赖。导致失败了。 java 类加载器对文件系统给的结果直接加载,并没有任何处理。
随即来看下 java -cp 加载 jar 依赖的的底层机制:大模型回答是调用 java File api, 写个例子验证下:
vi ClassPathResolver.java
import java.io.File;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
public class ClassPathResolver {
public static List<String> resolveClassPath(String path) {
List<String> classPaths = new ArrayList<>();
// 处理通配符路径
if (path.endsWith("/*")) {
String dirPath = path.substring(0, path.length() - 2);
File dir = new File(dirPath);
if (dir.exists() && dir.isDirectory()) {
File[] files = dir.listFiles((d, name) -> name.toLowerCase().endsWith(".jar"));
if (files != null) {
for (File file : files) {
classPaths.add(file.getAbsolutePath());
}
}
}
} else {
// 处理普通路径
classPaths.add(path);
}
return classPaths;
}
public static void main(String[] args) {
// 测试代码
String path = "lib/*";
List<String> paths = resolveClassPath(path);
System.out.println("找到的JAR文件:");
paths.forEach(System.out::println);
}
}
javac ClassPathResolver.java && java ClassPathResolver
# B xfs 输出:
找到的JAR文件:
/home/pro/third_party/seatunnel/lib/DmJdbcDriver8.jar
/home/pro/third_party/seatunnel/lib/hadoop-common-3.3.6.jar
/home/pro/third_party/seatunnel/lib/mysql-connector-j-8.0.33.jar
/home/pro/third_party/seatunnel/lib/postgresql-42.7.8.jar
/home/pro/third_party/seatunnel/lib/seatunnel-hadoop-aws.jar
/home/pro/third_party/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar
/home/pro/third_party/seatunnel/lib/seatunnel-transforms-v2.jar
# A ext4 输出:
找到的JAR文件:
/home/pro/third_party/seatunnel/lib/DmJdbcDriver8.jar
/home/pro/third_party/seatunnel/lib/seatunnel-transforms-v2.jar
/home/pro/third_party/seatunnel/lib/postgresql-42.7.8.jar
/home/pro/third_party/seatunnel/lib/mysql-connector-j-8.0.33.jar
/home/pro/third_party/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar
/home/pro/third_party/seatunnel/lib/hadoop-common-3.3.6.jar
/home/pro/third_party/seatunnel/lib/seatunnel-hadoop-aws.jar
listFiles 底层是用 native 方法:
@Override
public native String[] list(File file);
至此,问题已经明确了,解决方案就是手动修改 jar 顺序,让 seatunnel-hadoop3-3.1.4-uber 在 hadoop-common 之前即可。