Dalvik虚拟机 - 类的加载

Dalvik虚拟机系列的文章力求将从虚拟机开始运行、类的加载/初始化、字节码的解释执行都覆盖到。加载类是我们最常接触但又不经常直接去显示加载的一个行为,本文从 ClassLoader.loadClass 说起来说明类加载的具体过程,这也将更容易理解主动修改类定义带来的一些如同hotpatch的功能。

1. APK被加载的流程

我们先来看用 ClassLoader 主动加载类的情况,我们都知道一个 APK 是被 DexClassLoader 加载起来的,第一个问题就是一个 APK 是在哪里被哪一个 DexClassLoader 加载的呢?

《通过 startService 在新进程中启动服务的流程(一)》中我们知道,当要开启一个进程去承载新的应用程序时,会调用到 AMS 中的 startProcessLocked 函数,该函数最终又是通过调用 Process.start 方法请求 Zygote 进程 fork 目标进程的:
<center>

process.start.png

图1.1 从 Process.start() 到新进程创建(Android 4.4)</center>
图1.1展示了从 Process.start() 到新进程创建的过程,其中步骤7是fork新进程的过程,目标APK的加载就是在新进程创建后的步骤9中的,展开 handlerChildProc 函数:

private void handleChildProc(Arguments parsedArgs,
        FileDescriptor[] descriptors, FileDescriptor pipeFd, PrintStream newStderr)
        throws ZygoteInit.MethodAndArgsCaller {

    ......

    if (parsedArgs.runtimeInit) {
        ......
    } else {
        String className;
        try {
            className = parsedArgs.remainingArgs[0];
        } catch (ArrayIndexOutOfBoundsException ex) {
            logAndPrintError(newStderr,
                    "Missing required class name argument", null);
            return;
        }

        String[] mainArgs = new String[parsedArgs.remainingArgs.length - 1];
        System.arraycopy(parsedArgs.remainingArgs, 1,
                mainArgs, 0, mainArgs.length);

        if (parsedArgs.invokeWith != null) {
            WrapperInit.execStandalone(parsedArgs.invokeWith,
                    parsedArgs.classpath, className, mainArgs);
        } else {
            ClassLoader cloader;
            if (parsedArgs.classpath != null) {
*               cloader = new PathClassLoader(parsedArgs.classpath,
                        ClassLoader.getSystemClassLoader());
            } else {
                cloader = ClassLoader.getSystemClassLoader();
            }

            try {
                ZygoteInit.invokeStaticMain(cloader, className, mainArgs);
            } catch (RuntimeException ex) {
                logAndPrintError(newStderr, "Error starting.", ex);
            }
        }
    }
}

从上面代码中带*的一行可以看出,加载 APK 中所有类的 ClassLoader 是 PathClassLoader,它和 DexClassLoader 一样,基类均是 BaseDexClassLoader.

2. 加载类

2.1 基本流程

在 APK 的 ClassLoader 被指定后,APK 包中所有类(不包括代码中动态加载的dex包)都由该 ClassLoader 来加载,我们从 PathClassLoader 的 loadClass 方法看起,由于 PathClassLoader 并没有复写 loadClass,所以调用的仍是 ClassLoader 类的 loadClass 方法:
<center>

loadClass.png

图2.1 loadClass 流程</center>
图2.1展示了加载一个类的基本流程,可以发现在 android 中废除了 java 基础类 ClassLoader 中的 defineClass 方法,改为调用 DexFile 的 defineClass 方法,然后到了 native 层执行 Class.cpp 中的 findClassNoInit 方法,在findClassNoInit 中执行寻找类、加载类的逻辑,但不会执行初始化类的逻辑,findClassNoInit 的定义为:

ClassObject* findClassNoInit(const char* descriptor, Object* loader, DvmDex* pDvmDex);

从 findClassNoInit 的定义中看出,第三个参数类型是 DvmDex,它就是加载进来的 Dex 文件,它是在图2.1第7步Dalvik_dalvik_system_DexFile_defineClassNative 函数传过来的:

static void Dalvik_dalvik_system_DexFile_defineClassNative(const u4* args,
    JValue* pResult)
{
    ......
    if (pDexOrJar->isDex)
        pDvmDex = dvmGetRawDexFileDex(pDexOrJar->pRawDexFile);
    ......
}

不过它并不是在这创建的,因为 dvmGetRawDexFileDex 函数只是取出本来就存在 pDexOrJar->pRawDexFile 结构体里的 DvmDex 对象,下一小节我们就来看Dex文件是如何加载的。

2.2 Dex文件的加载

我们先来看一下 DvmDex 结构体:

struct DvmDex {
    /* pointer to the DexFile we're associated with */
    DexFile*            pDexFile; // 关联的DexFile指针

    /* clone of pDexFile->pHeader (it's used frequently enough) */
    const DexHeader*    pHeader; // pDexFile->pHeader的复制

    /* interned strings; parallel to "stringIds" */
    struct StringObject** pResStrings; //

    /* resolved classes; parallel to "typeIds" */
    struct ClassObject** pResClasses; // 解析过的类

    /* resolved methods; parallel to "methodIds" */
    struct Method**     pResMethods; // 解析过的方法

    /* resolved instance fields; parallel to "fieldIds" */
    /* (this holds both InstField and StaticField) */
    struct Field**      pResFields; // 解析过的字段,既包括实例字段也包括静态字段

    /* interface method lookup cache */
    struct AtomicCache* pInterfaceCache; // 

    /* shared memory region with file contents */
    bool                isMappedReadOnly;
    MemMapping          memMap;

    jobject dex_object;

    /* lock ensuring mutual exclusion during updates */
    pthread_mutex_t     modLock;
};

实际上DvmDex是在ClassLoader构造时创建的,ClassLoader 其实就是指 PathClassLoader 了。
<center>

findClassNoInit.png

图2.2 ClassLoader 创建过程中 Dex 文件的加载流程</center>
图2.2展示了 PathClassLoader 创建的基本过程(java层)中 Dex 文件的加载流程,代码调用就不再赘述了,openDexFileNative 是个 native 函数,代码是在 dalvik/vm/native/dalvik_system_DexFile.cpp,函数名 Dalvik_dalvik_system_DexFile_openDexFileNative():

static void Dalvik_dalvik_system_DexFile_openDexFileNative(const u4* args,
    JValue* pResult)
{
    ......

    /*
     * Try to open it directly as a DEX if the name ends with ".dex".
     * If that fails (or isn't tried in the first place), try it as a
     * Zip with a "classes.dex" inside.
     */
    if (hasDexExtension(sourceName)
1.          && dvmRawDexFileOpen(sourceName, outputName, &pRawDexFile, false) == 0) {
        ALOGV("Opening DEX file '%s' (DEX)", sourceName);

        pDexOrJar = (DexOrJar*) malloc(sizeof(DexOrJar));
        pDexOrJar->isDex = true;
        pDexOrJar->pRawDexFile = pRawDexFile;
        pDexOrJar->pDexMemory = NULL;
    } else if (dvmJarFileOpen(sourceName, outputName, &pJarFile, false) == 0) {
        ......
    } else {
        ......
    }

    if (pDexOrJar != NULL) {
        pDexOrJar->fileName = sourceName;
2.      addToDexFileTable(pDexOrJar);
    } else {
        free(sourceName);
    }

    free(outputName);
    RETURN_PTR(pDexOrJar);
}

步骤1:打开 dex 文件并进行优化与加载 (dvmRawDexFileOpen)(至于 jar 包或者 apk 的加载其实就是比 dex 文件的加载多一步解压);
步骤2:将创建的 DexOrJar 对象加入“用户加载过的 dex 文件”的哈希表中 (addToDexFileTable)。
我们这里只看 dvmRawDexFileOpen 函数:

int dvmRawDexFileOpen(const char* fileName, const char* odexOutputName,
    RawDexFile** ppRawDexFile, bool isBootstrap) // odexOutputName 就是 odex 文件的所在地
{
    ......

1.  dexFd = open(fileName, O_RDONLY);
    ......

    if (odexOutputName == NULL) {
        cachedName = dexOptGenerateCacheFileName(fileName, NULL);
        if (cachedName == NULL)
            goto bail;
    } else {
2.      cachedName = strdup(odexOutputName);
    }

    ALOGV("dvmRawDexFileOpen: Checking cache for %s (%s)",
            fileName, cachedName);

3.  optFd = dvmOpenCachedDexFile(fileName, cachedName, modTime,
        adler32, isBootstrap, &newFile, /*createIfMissing=*/true);

    if (optFd < 0) {
        ALOGI("Unable to open or create cache for %s (%s)",
                fileName, cachedName);
        goto bail;
    }
    locked = true;

    /*
     * If optFd points to a new file (because there was no cached
     * version, or the cached version was stale), generate the
     * optimized DEX. The file descriptor returned is still locked,
     * and is positioned just past the optimization header.
     */
    if (newFile) {
        u8 startWhen, copyWhen, endWhen;
        bool result;
        off_t dexOffset;

        dexOffset = lseek(optFd, 0, SEEK_CUR);
        result = (dexOffset > 0);

        if (result) {
            startWhen = dvmGetRelativeTimeUsec();
4.          result = copyFileToFile(optFd, dexFd, fileSize) == 0;
            copyWhen = dvmGetRelativeTimeUsec();
        }

        if (result) {
5.          result = dvmOptimizeDexFile(optFd, dexOffset, fileSize,
                fileName, modTime, adler32, isBootstrap);
        }

        if (!result) {
            ALOGE("Unable to extract+optimize DEX from '%s'", fileName);
            goto bail;
        }

        endWhen = dvmGetRelativeTimeUsec();
        ALOGD("DEX prep '%s': copy in %dms, rewrite %dms",
            fileName,
            (int) (copyWhen - startWhen) / 1000,
            (int) (endWhen - copyWhen) / 1000);
    }

    /*
     * Map the cached version.  This immediately rewinds the fd, so it
     * doesn't have to be seeked anywhere in particular.
     */
6.  if (dvmDexFileOpenFromFd(optFd, &pDvmDex) != 0) {
        ALOGI("Unable to map cached %s", fileName);
        goto bail;
    }

    if (locked) {
        /* unlock the fd */
        if (!dvmUnlockCachedDexFile(optFd)) {
            /* uh oh -- this process needs to exit or we'll wedge the system */
            ALOGE("Unable to unlock DEX file");
            goto bail;
        }
        locked = false;
    }

    ALOGV("Successfully opened '%s'", fileName);

    *ppRawDexFile = (RawDexFile*) calloc(1, sizeof(RawDexFile));
    (*ppRawDexFile)->cacheFileName = cachedName;
7.  (*ppRawDexFile)->pDvmDex = pDvmDex;
    cachedName = NULL;      // don't free it below
    result = 0;

bail:
    free(cachedName);
    if (dexFd >= 0) {
        close(dexFd);
    }
    if (optFd >= 0) {
        if (locked)
            (void) dvmUnlockCachedDexFile(optFd);
        close(optFd);
    }
    return result;
}

步骤1:打开 dex 文件,文件句柄是 dexFd;
步骤2:cacheName 就是 odexOutputName,也就是 odex(optimized DEX,即优化过的 dex) 文件的所在路径;
步骤3:打开 odex 文件,如果没有就创建它,如果是要创建新的,就填充头信息进去,如果是既有的,那就验证头信息;
步骤4:拷贝 dex 文件的所有内容到 odex 文件中,当然是在头信息后面;
步骤5:优化 dex 文件,简单来讲就是执行命令行程序 /bin/dexopt,其中重要的一步是将 Dex 文件中的类信息数组做一个映射哈希表,优化过的文件内容仍保存在 odex 中;
步骤6:把 odex 的内容映射到 pDvmDex 中,实际上是映射到了 DexFile 结构体,DvmDex 包含了 DexFile,和已经解析了的类、方法、字段信息,在这里就不再细细展开映射过程了,说一下 DexFile 的映射结构:
<center>

dex-file-general-structure-3.png

图2.3 DexFile 映射结构</center>
如图2.3所示(这是我直接盗的图啦啦啦),从 Dex Header 往下,依次会有 String、Type、Proto、Field、Method、Class Def 六个表,对应到 DexFile 结构体中:

/*
 * Structure representing a DEX file.
 *
 * Code should regard DexFile as opaque, using the API calls provided here
 * to access specific structures.
 */
struct DexFile {
    /* directly-mapped "opt" header */
    const DexOptHeader* pOptHeader;

    /* pointers to directly-mapped structs and arrays in base DEX */
    const DexHeader*    pHeader;
    const DexStringId*  pStringIds;
    const DexTypeId*    pTypeIds;
    const DexFieldId*   pFieldIds;
    const DexMethodId*  pMethodIds;
    const DexProtoId*   pProtoIds;
    const DexClassDef*  pClassDefs;
    ......
};

可以看出,这六个表实际上就是映射成一个个的结构体数组了,下文提到的一些 idx 结尾的一些变量,以及类加载的 CLASS_IDX 状态,都是与这些结构体数组的下标挂钩的。

步骤7:将 pDvmDex 赋值到 RawDexFile 结构体中。

简单用图说明下 DvmDex、DexFile、RawDexFile 之间的包含关系:
<center>

微信截图_20160216002516.png

图2.4 DvmDex、DexFile、RawDexFile 之间的包含关系</center>

回过头来,看函数 Dalvik_dalvik_system_DexFile_openDexFileNative 的步骤1,发现 dvmRawDexFileOpen 的作用就是给 DexOrJar 的成员 RawDexFile* pRawDexFile 赋值,赋值后返回这个 DexOrJar,在 java 层对应的就是一个 int 值 DexFile.mCookie,对应图2.2中的步骤6(DexFile的构造函数):

public DexFile(String fileName) throws IOException {
    mCookie = openDexFile(fileName, null, 0);
    mFileName = fileName;
    guard.open("close");
    //System.out.println("DEX FILE cookie is " + mCookie);
}

2.3 findClassNoInit

2.2节我们分析了 DvmDex 和 DexOrJar,这一节回到 findClassNoInit 函数:
<center>

findClassNoInit.png

图2.5 findClassNoInit 流程</center>
如图2.3所示为 findClassNoInit 的流程,关键代码如下,其中步骤1-步骤5分别对应上面流程图的5部分:

static ClassObject* findClassNoInit(const char* descriptor, Object* loader,
    DvmDex* pDvmDex)
{
    Thread* self = dvmThreadSelf();
    ClassObject* clazz;
    ......

1.  clazz = dvmLookupClass(descriptor, loader, true);
    if (clazz == NULL) {
        const DexClassDef* pClassDef;

        ......

        if (pDvmDex == NULL) {
            assert(loader == NULL);     /* shouldn't be here otherwise */
            pDvmDex = searchBootPathForClass(descriptor, &pClassDef);
        } else {
2.          pClassDef = dexFindClass(pDvmDex->pDexFile, descriptor);
        }

        ......

        /* found a match, try to load it */
3.      clazz = loadClassFromDex(pDvmDex, pClassDef, loader);
        if (dvmCheckException(self)) {
            /* class was found but had issues */
            if (clazz != NULL) {
                dvmFreeClassInnards(clazz);
                dvmReleaseTrackedAlloc((Object*) clazz, NULL);
            }
            goto bail;
        }

        /*
         * Lock the class while we link it so other threads must wait for us
         * to finish.  Set the "initThreadId" so we can identify recursive
         * invocation.  (Note all accesses to initThreadId here are
         * guarded by the class object's lock.)
         */
        dvmLockObject(self, (Object*) clazz);
        clazz->initThreadId = self->threadId;

        /*
         * Add to hash table so lookups succeed.
         *
         * [Are circular references possible when linking a class?]
         */
        assert(clazz->classLoader == loader);
4.      if (!dvmAddClassToHash(clazz)) {
            /*
             * Another thread must have loaded the class after we
             * started but before we finished.  Discard what we've
             * done and leave some hints for the GC.
             *
             * (Yes, this happens.)
             */
            //ALOGW("WOW: somebody loaded %s simultaneously", descriptor);
            clazz->initThreadId = 0;
            dvmUnlockObject(self, (Object*) clazz);

            /* Let the GC free the class.
             */
            dvmFreeClassInnards(clazz);
            dvmReleaseTrackedAlloc((Object*) clazz, NULL);

            /* Grab the winning class.
             */
            clazz = dvmLookupClass(descriptor, loader, true);
            assert(clazz != NULL);
            goto got_class;
        }
        dvmReleaseTrackedAlloc((Object*) clazz, NULL);

#if LOG_CLASS_LOADING
        logClassLoadWithTime('>', clazz, startTime);
#endif
        /*
         * Prepare and resolve.
         */
5.      if (!dvmLinkClass(clazz)) {
            ......
        }
        dvmObjectNotifyAll(self, (Object*) clazz);
        dvmUnlockObject(self, (Object*) clazz);

        /*
         * Add class stats to global counters.
         *
         * TODO: these should probably be atomic ops.
         */
        gDvm.numLoadedClasses++;
        gDvm.numDeclaredMethods +=
            clazz->virtualMethodCount + clazz->directMethodCount;
        gDvm.numDeclaredInstFields += clazz->ifieldCount;
        gDvm.numDeclaredStaticFields += clazz->sfieldCount;

        /*
         * Cache pointers to basic classes.  We want to use these in
         * various places, and it's easiest to initialize them on first
         * use rather than trying to force them to initialize (startup
         * ordering makes it weird).
         */
        if (gDvm.classJavaLangObject == NULL &&
            strcmp(descriptor, "Ljava/lang/Object;") == 0)
        {
            /* It should be impossible to get here with anything
             * but the bootclasspath loader.
             */
            assert(loader == NULL);
            gDvm.classJavaLangObject = clazz;
        }

#if LOG_CLASS_LOADING
        logClassLoad('<', clazz);
#endif

    } else {
got_class:
        ......
    }

    ......
    return clazz;
}

2.3.1 dvmLookupClass

首先查找指定类加载器加载过的类,如果已经加载,则不会执行加载的逻辑。其实在 loadClass 函数中,第一步也是查找该类是否已经被该类加载器加载过(findLoadedClass),它实际也是和这里一样调用 dvmLookupClass:

ClassObject* dvmLookupClass(const char* descriptor, Object* loader,
    bool unprepOkay)
{
    ClassMatchCriteria crit;
    void* found;
    u4 hash;

    crit.descriptor = descriptor;
    crit.loader = loader;
    ......
1.  found = dvmHashTableLookup(gDvm.loadedClasses, hash, &crit,
                hashcmpClassByCrit, false);
    ......
2.  if (found && !unprepOkay && !dvmIsClassLinked((ClassObject*)found)) {
        ALOGV("Ignoring not-yet-ready %s, using slow path",
            ((ClassObject*)found)->descriptor);
        found = NULL;
    }

    return (ClassObject*) found;
}

步骤1:从已经加载过的类(gDvm.loadedClasses)哈希表中查找该类是否存在(存在哈希表中并不表示加载了),key的类型是ClassMatchCriteria,该结构体定义如下:

struct ClassMatchCriteria {
    const char* descriptor;
    Object*     loader;
};

从 ClassMatchCriteria 的结构可以看出,类描述和加载器完全一样才算是匹配。
步骤2:如果找到匹配了,判断该类是否已经链接,如果已经链接,就是已经被加载了,如果还没有链接,那就仍被认为没找到:

INLINE bool dvmIsClassLinked(const ClassObject* clazz) {
    return clazz->status >= CLASS_RESOLVED;
}

dvmIsClassLinked 是个内联函数,它其实就是判断类的状态是否是已经解析(CLASS_RESOLVED),ClassObject 的 status 字段其实有 8 种状态(CLASS_ERROR 除外):

enum ClassStatus {
    CLASS_ERROR         = -1,

    CLASS_NOTREADY      = 0,
    CLASS_IDX           = 1,    /* loaded, DEX idx in super or ifaces */
    CLASS_LOADED        = 2,    /* DEX idx values resolved */
    CLASS_RESOLVED      = 3,    /* part of linking */
    CLASS_VERIFYING     = 4,    /* in the process of being verified */
    CLASS_VERIFIED      = 5,    /* logically part of linking; done pre-init */
    CLASS_INITIALIZING  = 6,    /* class init in progress */
    CLASS_INITIALIZED   = 7,    /* ready to go */
};

在后面的篇幅里会一一说明每个类型是在什么情况下赋值的。

2.3.2 dexFindClass

如果通过 dvmLookupClass 发现该类没有加载,就会首先通过dexFindClass从加载进来的 Dex 文件中查找该类的定义,该函数是在 DexFile.cpp 中:

const DexClassDef* dexFindClass(const DexFile* pDexFile,
    const char* descriptor)
{
    const DexClassLookup* pLookup = pDexFile->pClassLookup;
    u4 hash;
    int idx, mask;

    hash = classDescriptorHash(descriptor);
    mask = pLookup->numEntries - 1;
1.  idx = hash & mask;

    /*
     * Search until we find a matching entry or an empty slot.
     */
    while (true) {
        int offset;

        offset = pLookup->table[idx].classDescriptorOffset;
2.      if (offset == 0)
            return NULL;

        if (pLookup->table[idx].classDescriptorHash == hash) {
            const char* str;

            str = (const char*) (pDexFile->baseAddr + offset);
            if (strcmp(str, descriptor) == 0) {
3.              return (const DexClassDef*)
                    (pDexFile->baseAddr + pLookup->table[idx].classDefOffset);
            }
        }

        idx = (idx + 1) & mask;
    }
}

步骤1:DexFile::pClassLookup 实际上是在加载 Dex 文件时解析的每个类存储的一个映射表,key 是通过类的说明descriptor计算的哈希,value 是存放解析的类的偏移吗,这一步是计算hash和表的下标;
步骤2:如果找到最后都没找到,就返回NULL;
步骤3:找到了就返回该类定义 DexClassDef:

struct DexClassDef {
    u4  classIdx;           /* index into typeIds for this class */
    u4  accessFlags;
    u4  superclassIdx;      /* index into typeIds for superclass */
    u4  interfacesOff;      /* file offset to DexTypeList */
    u4  sourceFileIdx;      /* index into stringIds for source file name */
    u4  annotationsOff;     /* file offset to annotations_directory_item */
    u4  classDataOff;       /* file offset to class_data_item */
    u4  staticValuesOff;    /* file offset to DexEncodedArray */
};

2.3.3 loadClassFromDex

static ClassObject* loadClassFromDex(DvmDex* pDvmDex,
    const DexClassDef* pClassDef, Object* classLoader)
{
    ClassObject* result;
    DexClassDataHeader header;
    const u1* pEncodedData;
    const DexFile* pDexFile;

    assert((pDvmDex != NULL) && (pClassDef != NULL));
    pDexFile = pDvmDex->pDexFile;

    if (gDvm.verboseClass) {
        ALOGV("CLASS: loading '%s'...",
            dexGetClassDescriptor(pDexFile, pClassDef));
    }

1.  pEncodedData = dexGetClassData(pDexFile, pClassDef);

    if (pEncodedData != NULL) {
2.      dexReadClassDataHeader(&pEncodedData, &header);
    } else {
        // Provide an all-zeroes header for the rest of the loading.
        memset(&header, 0, sizeof(header));
    }

3.  result = loadClassFromDex0(pDvmDex, pClassDef, &header, pEncodedData,
            classLoader);

    if (gDvm.verboseClass && (result != NULL)) {
        ALOGI("[Loaded %s from DEX %p (cl=%p)]",
            result->descriptor, pDvmDex, classLoader);
    }

    return result;
}

步骤1:拿到 ClassData 的指针;
步骤2:读取 ClassData 的头信息
步骤3:根据前两步拿到的信息loadClass:

static ClassObject* loadClassFromDex0(DvmDex* pDvmDex,
    const DexClassDef* pClassDef, const DexClassDataHeader* pHeader,
    const u1* pEncodedData, Object* classLoader)
{
    ClassObject* newClass = NULL;
    ......

    /*
     * Allocate storage for the class object on the GC heap, so that other
     * objects can have references to it.  We bypass the usual mechanism
     * (allocObject), because we don't have all the bits and pieces yet.
     *
     * Note that we assume that java.lang.Class does not override
     * finalize().
     */
    /* TODO: Can there be fewer special checks in the usual path? */
    assert(descriptor != NULL);
    if (classLoader == NULL &&
        strcmp(descriptor, "Ljava/lang/Class;") == 0) {
        assert(gDvm.classJavaLangClass != NULL);
        newClass = gDvm.classJavaLangClass;
    } else {
        size_t size = classObjectSize(pHeader->staticFieldsSize);
1.      newClass = (ClassObject*) dvmMalloc(size, ALLOC_NON_MOVING);
    }
    if (newClass == NULL)
        return NULL;

2.  DVM_OBJECT_INIT(newClass, gDvm.classJavaLangClass); // 初始化 java.lang.Class 成员
    dvmSetClassSerialNumber(newClass); // 初始化 serialNumber
    newClass->descriptor = descriptor; // 类全描述
    assert(newClass->descriptorAlloc == NULL);
    SET_CLASS_FLAG(newClass, pClassDef->accessFlags); // 类访问权限
3.  dvmSetFieldObject((Object *)newClass,
                      OFFSETOF_MEMBER(ClassObject, classLoader),
                      (Object *)classLoader); // 初始化 ClassLoader
    newClass->pDvmDex = pDvmDex;
    newClass->primitiveType = PRIM_NOT;
    newClass->status = CLASS_IDX; // 初始化类加载状态为 CLASS_IDX

    /*
     * Stuff the superclass index into the object pointer field.  The linker
     * pulls it out and replaces it with a resolved ClassObject pointer.
     * I'm doing it this way (rather than having a dedicated superclassIdx
     * field) to save a few bytes of overhead per class.
     *
     * newClass->super is not traversed or freed by dvmFreeClassInnards, so
     * this is safe.
     */
    assert(sizeof(u4) == sizeof(ClassObject*)); /* 32-bit check */
4.  newClass->super = (ClassObject*) pClassDef->superclassIdx;

    /*
     * Stuff class reference indices into the pointer fields.
     *
     * The elements of newClass->interfaces are not traversed or freed by
     * dvmFreeClassInnards, so this is GC-safe.
     */
    const DexTypeList* pInterfacesList;
5.  pInterfacesList = dexGetInterfacesList(pDexFile, pClassDef);
    if (pInterfacesList != NULL) {
        newClass->interfaceCount = pInterfacesList->size;
        newClass->interfaces = (ClassObject**) dvmLinearAlloc(classLoader,
                newClass->interfaceCount * sizeof(ClassObject*));

        for (i = 0; i < newClass->interfaceCount; i++) {
            const DexTypeItem* pType = dexGetTypeItem(pInterfacesList, i);
            newClass->interfaces[i] = (ClassObject*)(u4) pType->typeIdx;
        }
        dvmLinearReadOnly(classLoader, newClass->interfaces);
    }

    /* load field definitions */

    /*
     * Over-allocate the class object and append static field info
     * onto the end.  It's fixed-size and known at alloc time.  This
     * seems to increase zygote sharing.  Heap compaction will have to
     * be careful if it ever tries to move ClassObject instances,
     * because we pass Field pointers around internally. But at least
     * now these Field pointers are in the object heap.
     */

6.  if (pHeader->staticFieldsSize != 0) {
        /* static fields stay on system heap; field data isn't "write once" */
        int count = (int) pHeader->staticFieldsSize;
        u4 lastIndex = 0;
        DexField field;

        newClass->sfieldCount = count;
        for (i = 0; i < count; i++) {
            dexReadClassDataField(&pEncodedData, &field, &lastIndex);
            loadSFieldFromDex(newClass, &field, &newClass->sfields[i]);
        }
    }

7.  if (pHeader->instanceFieldsSize != 0) {
        int count = (int) pHeader->instanceFieldsSize;
        u4 lastIndex = 0;
        DexField field;

        newClass->ifieldCount = count;
        newClass->ifields = (InstField*) dvmLinearAlloc(classLoader,
                count * sizeof(InstField));
        for (i = 0; i < count; i++) {
            dexReadClassDataField(&pEncodedData, &field, &lastIndex);
            loadIFieldFromDex(newClass, &field, &newClass->ifields[i]);
        }
        dvmLinearReadOnly(classLoader, newClass->ifields);
    }

    /*
     * Load method definitions.  We do this in two batches, direct then
     * virtual.
     *
     * If register maps have already been generated for this class, and
     * precise GC is enabled, we pull out pointers to them.  We know that
     * they were streamed to the DEX file in the same order in which the
     * methods appear.
     *
     * If the class wasn't pre-verified, the maps will be generated when
     * the class is verified during class initialization.
     */
    u4 classDefIdx = dexGetIndexForClassDef(pDexFile, pClassDef);
    const void* classMapData;
    u4 numMethods;

    if (gDvm.preciseGc) {
        classMapData =
            dvmRegisterMapGetClassData(pDexFile, classDefIdx, &numMethods);

        /* sanity check */
        if (classMapData != NULL &&
            pHeader->directMethodsSize + pHeader->virtualMethodsSize != numMethods)
        {
            ALOGE("ERROR: in %s, direct=%d virtual=%d, maps have %d",
                newClass->descriptor, pHeader->directMethodsSize,
                pHeader->virtualMethodsSize, numMethods);
            assert(false);
            classMapData = NULL;        /* abandon */
        }
    } else {
        classMapData = NULL;
    }

8.  if (pHeader->directMethodsSize != 0) {
        int count = (int) pHeader->directMethodsSize;
        u4 lastIndex = 0;
        DexMethod method;

        newClass->directMethodCount = count;
        newClass->directMethods = (Method*) dvmLinearAlloc(classLoader,
                count * sizeof(Method));
        for (i = 0; i < count; i++) {
            dexReadClassDataMethod(&pEncodedData, &method, &lastIndex);
            loadMethodFromDex(newClass, &method, &newClass->directMethods[i]);
            if (classMapData != NULL) {
                const RegisterMap* pMap = dvmRegisterMapGetNext(&classMapData);
                if (dvmRegisterMapGetFormat(pMap) != kRegMapFormatNone) {
                    newClass->directMethods[i].registerMap = pMap;
                    /* TODO: add rigorous checks */
                    assert((newClass->directMethods[i].registersSize+7) / 8 ==
                        newClass->directMethods[i].registerMap->regWidth);
                }
            }
        }
        dvmLinearReadOnly(classLoader, newClass->directMethods);
    }

9.  if (pHeader->virtualMethodsSize != 0) {
        int count = (int) pHeader->virtualMethodsSize;
        u4 lastIndex = 0;
        DexMethod method;

        newClass->virtualMethodCount = count;
        newClass->virtualMethods = (Method*) dvmLinearAlloc(classLoader,
                count * sizeof(Method));
        for (i = 0; i < count; i++) {
            dexReadClassDataMethod(&pEncodedData, &method, &lastIndex);
            loadMethodFromDex(newClass, &method, &newClass->virtualMethods[i]);
            if (classMapData != NULL) {
                const RegisterMap* pMap = dvmRegisterMapGetNext(&classMapData);
                if (dvmRegisterMapGetFormat(pMap) != kRegMapFormatNone) {
                    newClass->virtualMethods[i].registerMap = pMap;
                    /* TODO: add rigorous checks */
                    assert((newClass->virtualMethods[i].registersSize+7) / 8 ==
                        newClass->virtualMethods[i].registerMap->regWidth);
                }
            }
        }
        dvmLinearReadOnly(classLoader, newClass->virtualMethods);
    }

    newClass->sourceFile = dexGetSourceFile(pDexFile, pClassDef);

    /* caller must call dvmReleaseTrackedAlloc */
    return newClass;
}

第一步:分配 ClassObject 对象(newClass)的内存;
第二步:初始化该类的 java.lang.Class 成员,每一个 java 对象在 native 层都会对应的 ClassObject 结构体其实都是继承于 Object:

struct Object {
    /* ptr to class object */
    ClassObject*    clazz;

    /*
     * A word containing either a "thin" lock or a "fat" monitor.  See
     * the comments in Sync.c for a description of its layout.
     */
    u4              lock;
};

struct ClassObject : Object {
    ...
};

即每一种对象都会有8字节的头。这一步就是初始化这个 clazz 成员;
第三步:初始化 ClassLoader、Dex对象、加载状态等,此时的状态为 CLASS_IDX,区别于 CLASS_LOADED,CLASS_IDX 状态时 ClassObject 中的成员都不是直接的指针/引用而是数字下标index;
第四步:父类(newClass->super)初始化;
第五步:接口(newClass->interfaces)初始化;
第六步:静态成员初始化;
第七步:实例成员初始化;
第八步:普通函数初始化;
第九步:虚函数初始化;

2.3.4 dvmAddClassToHash

这一步就是将加载了的类添加进哈希表 gDvm.loadedClasses 中:

bool dvmAddClassToHash(ClassObject* clazz)
{
    ......
*   found = dvmHashTableLookup(gDvm.loadedClasses, hash, clazz,
                hashcmpClassByClass, true);
    ......
    return (found == (void*) clazz);
}

dvmHashTableLookup 的最后一个参数代表是否在没有查找到的时候添加进去,具体我看就不必展开了。

2.3.5 dvmLinkClass

前面讲的 loadClassFromDex 是将 ClassObject 中接口、方法信息以索引(index)的形式存起来了(说句题外话,大家可以看下在 DexFile.h 中定义的这几个结构体,classIdx、protoIdx 的类型均是 u2,即两个字节,这是不是就意味着类、方法的总数都是最多2^16-1个呢???),本节讲的 dvmLinkClass 则会将这些索引替换为真正的引用:

bool dvmLinkClass(ClassObject* clazz)
{
    ......

    /* "Resolve" the class.
     *
     * At this point, clazz's reference fields may contain Dex file
     * indices instead of direct object references.  Proxy objects are
     * an exception, and may be the only exception.  We need to
     * translate those indices into real references, and let the GC
     * look inside this ClassObject.
     */
    if (clazz->status == CLASS_IDX) {

        ......

        superclassIdx = (u4) clazz->super;
        clazz->super = NULL;
        /* After this line, clazz will be fair game for the GC. The
         * superclass and interfaces are all NULL.
         */
        clazz->status = CLASS_LOADED;

        if (superclassIdx != kDexNoIndex) {
1.          ClassObject* super = dvmResolveClass(clazz, superclassIdx, false);
            if (super == NULL) {
                assert(dvmCheckException(dvmThreadSelf()));
                if (gDvm.optimizing) {
                    /* happens with "external" libs */
                    ALOGV("Unable to resolve superclass of %s (%d)",
                         clazz->descriptor, superclassIdx);
                } else {
                    ALOGW("Unable to resolve superclass of %s (%d)",
                         clazz->descriptor, superclassIdx);
                }
                goto bail;
            }
            dvmSetFieldObject((Object *)clazz,
                              OFFSETOF_MEMBER(ClassObject, super),
                              (Object *)super);
        }

2.      if (clazz->interfaceCount > 0) {
            /* Resolve the interfaces implemented directly by this class. */
            assert(interfaceIdxArray != NULL);
            dvmLinearReadWrite(clazz->classLoader, clazz->interfaces);
            for (i = 0; i < clazz->interfaceCount; i++) {
                assert(interfaceIdxArray[i] != kDexNoIndex);
                clazz->interfaces[i] =
                    dvmResolveClass(clazz, interfaceIdxArray[i], false);
                ......
            }
            dvmLinearReadOnly(clazz->classLoader, clazz->interfaces);
        }
    }
    /*
     * There are now Class references visible to the GC in super and
     * interfaces.
     */

    /*
     * All classes have a direct superclass, except for
     * java/lang/Object and primitive classes. Primitive classes are
     * are created CLASS_INITIALIZED, so won't get here.
     */
    assert(clazz->primitiveType == PRIM_NOT);
    if (strcmp(clazz->descriptor, "Ljava/lang/Object;") == 0) {
        ......
    } else {
        if (clazz->super == NULL) {
            dvmThrowLinkageError("no superclass defined");
            goto bail;
        }
        /* verify */
3.      if (dvmIsFinalClass(clazz->super)) {
            ALOGW("Superclass of '%s' is final '%s'",
                clazz->descriptor, clazz->super->descriptor);
            dvmThrowIncompatibleClassChangeError("superclass is final");
            goto bail;
        } else if (dvmIsInterfaceClass(clazz->super)) {
            ALOGW("Superclass of '%s' is interface '%s'",
                clazz->descriptor, clazz->super->descriptor);
            dvmThrowIncompatibleClassChangeError("superclass is an interface");
            goto bail;
        } else if (!dvmCheckClassAccess(clazz, clazz->super)) {
            ALOGW("Superclass of '%s' (%s) is not accessible",
                clazz->descriptor, clazz->super->descriptor);
            dvmThrowIllegalAccessError("superclass not accessible");
            goto bail;
        }

        /* Inherit finalizability from the superclass.  If this
         * class also overrides finalize(), its CLASS_ISFINALIZABLE
         * bit will already be set.
         */
        if (IS_CLASS_FLAG_SET(clazz->super, CLASS_ISFINALIZABLE)) {
            SET_CLASS_FLAG(clazz, CLASS_ISFINALIZABLE);
        }

        /* See if this class descends from java.lang.Reference
         * and set the class flags appropriately.
         */
4.      if (IS_CLASS_FLAG_SET(clazz->super, CLASS_ISREFERENCE)) {
            u4 superRefFlags;

            /* We've already determined the reference type of this
             * inheritance chain.  Inherit reference-ness from the superclass.
             */
            superRefFlags = GET_CLASS_FLAG_GROUP(clazz->super,
                    CLASS_ISREFERENCE |
                    CLASS_ISWEAKREFERENCE |
                    CLASS_ISFINALIZERREFERENCE |
                    CLASS_ISPHANTOMREFERENCE);
            SET_CLASS_FLAG(clazz, superRefFlags);
        } else if (clazz->classLoader == NULL &&
                clazz->super->classLoader == NULL &&
                strcmp(clazz->super->descriptor,
                       "Ljava/lang/ref/Reference;") == 0)
        {
            u4 refFlags;

            /* This class extends Reference, which means it should
             * be one of the magic Soft/Weak/PhantomReference classes.
             */
            refFlags = CLASS_ISREFERENCE;
            if (strcmp(clazz->descriptor,
                       "Ljava/lang/ref/SoftReference;") == 0)
            {
                /* Only CLASS_ISREFERENCE is set for soft references.
                 */
            } else if (strcmp(clazz->descriptor,
                       "Ljava/lang/ref/WeakReference;") == 0)
            {
                refFlags |= CLASS_ISWEAKREFERENCE;
            } else if (strcmp(clazz->descriptor,
                       "Ljava/lang/ref/FinalizerReference;") == 0)
            {
                refFlags |= CLASS_ISFINALIZERREFERENCE;
            }  else if (strcmp(clazz->descriptor,
                       "Ljava/lang/ref/PhantomReference;") == 0)
            {
                refFlags |= CLASS_ISPHANTOMREFERENCE;
            } else {
                /* No-one else is allowed to inherit directly
                 * from Reference.
                 */
//xxx is this the right exception?  better than an assertion.
                dvmThrowLinkageError("illegal inheritance from Reference");
                goto bail;
            }

            /* The class should not have any reference bits set yet.
             */
            assert(GET_CLASS_FLAG_GROUP(clazz,
                    CLASS_ISREFERENCE |
                    CLASS_ISWEAKREFERENCE |
                    CLASS_ISFINALIZERREFERENCE |
                    CLASS_ISPHANTOMREFERENCE) == 0);

            SET_CLASS_FLAG(clazz, refFlags);
        }
    }

    /*
     * Populate vtable.
     */
5.  if (dvmIsInterfaceClass(clazz)) {
        /* no vtable; just set the method indices */
        int count = clazz->virtualMethodCount;

        if (count != (u2) count) {
            ALOGE("Too many methods (%d) in interface '%s'", count,
                 clazz->descriptor);
            goto bail;
        }

        dvmLinearReadWrite(clazz->classLoader, clazz->virtualMethods);

        for (i = 0; i < count; i++)
            clazz->virtualMethods[i].methodIndex = (u2) i;

        dvmLinearReadOnly(clazz->classLoader, clazz->virtualMethods);
    } else {
        if (!createVtable(clazz)) {
            ALOGW("failed creating vtable");
            goto bail;
        }
    }

    /*
     * Populate interface method tables.  Can alter the vtable.
     */
6.  if (!createIftable(clazz))
        goto bail;

    /*
     * Insert special-purpose "stub" method implementations.
     */
7.  if (!insertMethodStubs(clazz))
        goto bail;

    /*
     * Compute instance field offsets and, hence, the size of the object.
     */
8.  if (!computeFieldOffsets(clazz))
        goto bail;

    /*
     * Cache field and method info for the class Reference (as loaded
     * by the boot classloader). This has to happen after the call to
     * computeFieldOffsets().
     */
    if ((clazz->classLoader == NULL)
            && (strcmp(clazz->descriptor, "Ljava/lang/ref/Reference;") == 0)) {
        if (!precacheReferenceOffsets(clazz)) {
            ALOGE("failed pre-caching Reference offsets");
            dvmThrowInternalError(NULL);
            goto bail;
        }
    }

    /*
     * Compact the offsets the GC has to examine into a bitmap, if
     * possible.  (This has to happen after Reference.referent is
     * massaged in precacheReferenceOffsets.)
     */
    computeRefOffsets(clazz);

    /*
     * Done!
     */
9.  if (IS_CLASS_FLAG_SET(clazz, CLASS_ISPREVERIFIED))
        clazz->status = CLASS_VERIFIED;
    else
        clazz->status = CLASS_RESOLVED;
    okay = true;
    if (gDvm.verboseClass)
        ALOGV("CLASS: linked '%s'", clazz->descriptor);

    /*
     * We send CLASS_PREPARE events to the debugger from here.  The
     * definition of "preparation" is creating the static fields for a
     * class and initializing them to the standard default values, but not
     * executing any code (that comes later, during "initialization").
     *
     * We did the static prep in loadSFieldFromDex() while loading the class.
     *
     * The class has been prepared and resolved but possibly not yet verified
     * at this point.
     */
    if (gDvm.debuggerActive) {
        dvmDbgPostClassPrepare(clazz);
    }

bail:
    if (!okay) {
        clazz->status = CLASS_ERROR;
        if (!dvmCheckException(dvmThreadSelf())) {
            dvmThrowVirtualMachineError(NULL);
        }
    }
    if (interfaceIdxArray != NULL) {
        free(interfaceIdxArray);
    }

    return okay;
}

第一步:替换 clazz->super(父类)为真的父类 ClassObject 引用,这里用到了 dvmResolveClass 方法,本文虽然是从 ClassLoader.loadClass 说起的,但其实最常见的就是在解释器在执行某方法时,遇到某类没有解析过,就会执行 dvmResolveClass 方法去解析:

ClassObject* dvmResolveClass(const ClassObject* referrer, u4 classIdx,
    bool fromUnverifiedConstant)
{
    ......
1.  resClass = dvmDexGetResolvedClass(pDvmDex, classIdx); // 
    if (resClass != NULL)
        return resClass;

    ......
    if (className[0] != '\0' && className[1] == '\0') {
        /* primitive type */
        resClass = dvmFindPrimitiveClass(className[0]);
    } else {
2.      resClass = dvmFindClassNoInit(className, referrer->classLoader);
    }

    if (resClass != NULL) {
        ......
3.      dvmDexSetResolvedClass(pDvmDex, classIdx, resClass);
    } else {
        ......
    }

    return resClass;
}

(1) 查找已经解析了的类(已经解析了的类状态是 CLASS_RESOLVED)
(2) 若该类没有解析,则执行 dvmFindClassNoInit 加载类并解析,这个 dvmFindClassNoInit 是干嘛的呢?看起来跟前面说的 findClassNoInit 差不多,其实它最终就是反调java层的 ClassLoader.loadClass 去加载类,这时候是不是又回到了文章开头了 ~·~
(3) 将该类加入已解析类的表中
第二步:interfaceIdxArray 是事先已经复copy赋值为 clazz->interfaces 的,这步是给 clazz->interfaces 重新赋值为接口的引用,接口也是 ClassObject;
第三步:如果该类不是 java.lang.Object 的话,那么就必须有父类,判断父类是否是 final 的、是否是个接口、是否有访问权限
第四步:SoftReference、WeakReference 一类的类特殊对待;
第五步:对于非接口类,vtable 的创建;
第六步:接口表 iftable 的创建
第七步:虚函数的实现全部换为native实现 - 抛一个 “abstract method not implemented” 的 AbstractMethodError;
第八步:调整(将引用调至非引用之前,所有双宽度字段都已经对齐)并计算字段偏移,以及类的大小;
第九步:基本结束了,如果该类在 dexopt 阶段预先通过了 dvmVerifyClass,打上了 CLASS_ISPREVERIFIED 标记,则该类的状态标识为 CLASS_VERIFIED,否则标识为 CLASS_RESOLVED。

2.4 小结

至此,类加载的过程就算完成了,java 中执行的代码肯定是方法体,即解释器解释执行的过程就是方法执行的过程,这时候也会伴随类的初始化<cinit>以及对象初始化<init>,下一节将会讲解类的初始化相关的内容。

标签: none

添加新评论