{ "cells": [ { "cell_type": "markdown", "id": "7f6cada9", "metadata": {}, "source": [ "# Manual Feature Engineering\n", "\n", "매뉴얼 feature engineering은 지루한 프로세스일 수 있으며(이 때문에 기능 툴과 함께 자동화된 feature engineering을 사용하는 것입니다!) 종종 도메인 전문지식에 의존합니다.
\n", "저는 대출에 대한 도메인 지식이 제한되어 있고 무엇이 사람을 채무 불이행하게 만드는지 알기 때문에, 대신 저는 최대한 많은 정보를 최종 교육 자료로 가져오는데 집중할 것입니다.
\n", "모델은 우리가 결정할 필요 없이 어떤 feature이 중요한지 선택할 것입니다.
\n", "기본적으로, 우리의 접근 방식은 최대한 많은 feature을 만든 다음 사용할 수 있도록 모든 feature을 모델에 제공하는 것입니다.
\n", "나중에 모델에서 가져온 feature이나 PCA와 같은 다른 기술을 사용하여 feature 축소를 수행할 수 있습니다.\n", "\n", "매뉴얼 feature engineering 프로세스에는 많은 pandas 코드, 약간의 인내심, 그리고 많은 훌륭히 조작한 데이터가 필요합니다. \n", "자동화된 feature engineering 도구를 사용할 수 있게 되었지만, feature engineering은 좀 더 오랜 시간 동안 많은 데이터를 사용하여 수행해야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "e2515f02", "metadata": {}, "outputs": [], "source": [ "# pandas and numpy for data manipulation\n", "import pandas as pd\n", "import numpy as np\n", "\n", "# matplotlib and seaborn for plotting\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# Suppress warnings from pandas\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "plt.style.use('fivethirtyeight')" ] }, { "cell_type": "code", "execution_count": 3, "id": "010a5020", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1716428, 17)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Read in bureau\n", "bureau = pd.read_csv('./input/bureau.csv')\n", "bureau.head()\n", "bureau.shape" ] }, { "cell_type": "code", "execution_count": 4, "id": "96078687", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRprevious_loan_counts
01000017
11000028
21000034
31000042
41000053
\n", "
" ], "text/plain": [ " SK_ID_CURR previous_loan_counts\n", "0 100001 7\n", "1 100002 8\n", "2 100003 4\n", "3 100004 2\n", "4 100005 3" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Groupby the client id (SK_ID_CURR), count the number of previous loans, and rename the column\n", "previous_loan_counts = bureau.groupby('SK_ID_CURR', as_index=False)['SK_ID_BUREAU'].count().rename(columns = {'SK_ID_BUREAU': 'previous_loan_counts'})\n", "previous_loan_counts.head()\n", "#previous_loan_counts.shape" ] }, { "cell_type": "code", "execution_count": 68, "id": "ee9ef540", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.139376\n", "1 NaN\n", "2 0.729567\n", "3 NaN\n", "4 NaN\n", "Name: EXT_SOURCE_3, dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Join to the training dataframe\n", "train = pd.read_csv('./input/application_train.csv')\n", "train = train.merge(previous_loan_counts, on = 'SK_ID_CURR', how = 'left')\n", "\n", "# Fill the missing values with 0 \n", "train['previous_loan_counts'] = train['previous_loan_counts'].fillna(0)\n", "train.head()" ] }, { "cell_type": "markdown", "id": "577af56f", "metadata": {}, "source": [ "## Assessing Usefulness of New Variable with r value\n", "새 변수가 유용한지 확인하기 위해 이 변수와 목표값 사이의 Pearson 상관 계수(r-값)를 계산할 수 있습니다. 이 값은 두 변수 사이의 선형 관계의 강도를 측정하고 범위는 -1(완벽하게 음수 선형)에서 +1(완벽하게 양의 선형) 사이입니다. r-값은 새 변수의 \"유용성\"에 대한 가장 좋은 척도는 아니지만 변수가 기계 학습 모델에 도움이 되는지 여부에 대한 첫 번째 근사치를 제공할 수 있습니다. 목표값에 대한 변수의 r-값이 클수록 이 변수의 변화가 목표값에 더 많은 영향을 미칠 수 있습니다. 따라서 목표값에 비해 절대값 r-값이 가장 큰 변수를 찾습니다.\n", "\n", "### Kernel Density Estimate Plots\n", "\n", "커널 밀도 추정치 그림에서는 단일 변수의 분포를 보여 줍니다(평활 히스토그램으로 생각). 범주형 변수의 값에 따른 분포의 차이를 확인하기 위해 범주에 따라 분포의 색상을 다르게 지정할 수 있습니다. 예를 들어, TARGET = 1인지 0인지에 따라 색상이 지정된 이전_target_count의 커널 밀도 추정치를 표시할 수 있습니다. 그 결과 KDE는 대출금을 갚지 않은 사람(목표값 == 1)과 갚지 않은 사람(목표값 == 0) 사이의 변수 분포에서 유의한 차이를 보여줍니다. 이는 변수가 기계 학습 모델과 '관련성'이 있는지 여부를 나타내는 지표 역할을 할 수 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "0209f335", "metadata": {}, "outputs": [], "source": [ "# Plots the disribution of a variable colored by value of the target\n", "def kde_target(var_name, df):\n", " \n", " # Calculate the correlation coefficient between the new variable and the target\n", " corr = df['TARGET'].corr(df[var_name])\n", " \n", " # Calculate medians for repaid vs not repaid\n", " avg_repaid = df.loc[df['TARGET'] == 0, var_name].median()\n", " avg_not_repaid = df.loc[df['TARGET'] == 1, var_name].median()\n", " \n", " plt.figure(figsize = (12, 6))\n", " \n", " # Plot the distribution for target == 0 and target == 1\n", " sns.kdeplot(df.loc[df['TARGET'] == 0, var_name], label = 'TARGET == 0')\n", " sns.kdeplot(df.loc[df['TARGET'] == 1, var_name], label = 'TARGET == 1')\n", " \n", " # label the plot\n", " plt.xlabel(var_name); plt.ylabel('Density'); plt.title('%s Distribution' % var_name)\n", " plt.legend();\n", " \n", " # print out the correlation\n", " print('The correlation between %s and the TARGET is %0.4f' % (var_name, corr))\n", " # Print out average values\n", " print('Median value for loan that was not repaid = %0.4f' % avg_not_repaid)\n", " print('Median value for loan that was repaid = %0.4f' % avg_repaid)" ] }, { "cell_type": "markdown", "id": "fdd751ea", "metadata": {}, "source": [ "랜덤 포레스트 및 그라데이션 부스팅 머신에 따라 가장 중요한 변수 중 하나인 EXT_SOURCE_3 변수를 사용하여 이 함수를 테스트할 수 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "250b3e6b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The correlation between EXT_SOURCE_3 and the TARGET is -0.1789\n", "Median value for loan that was not repaid = 0.3791\n", "Median value for loan that was repaid = 0.5460\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAzYAAAGoCAYAAACUvHBbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAADAu0lEQVR4nOzdd1gUVxcH4N9sX3oHxYIVsVfs2HvvGmOMKRpbmjWJNYkao4lJjBoTY2KvsVfsHbtiRVQsKB2Wun3m+4PPxWEBAYHZZc/7PDzJ3rmznGVwmbP33nMZlUrFgRBCCCGEEEKsmEjoAAghhBBCCCHkbVFiQwghhBBCCLF6lNgQQgghhBBCrB4lNoQQQgghhBCrR4kNIYQQQgghxOpRYkMIIYQQQgixepTYEEIIISWsTp06qFOnjmDff+zYsXBxccHTp09NbU+fPoWLiwt69OghWFwAsGDBAri4uODMmTOCxkEIsT6U2BBCrI6Li8sbv/bt22fqP3r0aLi4uGD+/Pk5Pp9Go0GzZs3g6uqKU6dOoU6dOvn6Hq++FixYUODXkJycjAULFiAoKAjlypWDl5cXatSogfbt22PatGm4dOlSjucZjUZs3LgRAwcORPXq1eHp6YnKlSuje/fuWLFiBdRqdY7n9ejR4403i69uKLO/nlfnvv5VtmxZNG/eHLNmzUJCQkKuz6lSqfDzzz+jW7duqFKlCjw8PFCxYkV07NgR33//PZ48ecLr/+qGO6+vd955J9fvl5edO3di5MiRaNy4MSpUqAAfHx80atQIH330Ea5fv16g59qwYUOOP5OAgAD06tULc+fOxb179woVZ364uLgImhi9jVe/Zxs2bBA6FEJIKSMROgBCCCmsadOm5XqsevXqpv9fvHgxLly4gJ9++gmdOnVCkyZNeH1nzpyJ+/fvY9y4cWjTpg3Gjh2L5ORkXp/9+/fj9u3b6N69u9kNZatWrQoUd3R0NLp27YonT56gYsWKGDBgANzc3BAVFYWHDx9i1apV0Ol0CAwM5J0XFRWF4cOH49q1a3B3d0enTp1QtmxZJCQk4NixY/jqq6/w559/YvPmzfD39y9QTPkxbNgwVKhQARzHITY2FocPH8Zvv/2G3bt34+TJk3B1deX1P3LkCD7++GOoVCr4+fmhW7du8PLyQlpaGm7evIklS5bg119/xZEjR1C/fn3euTn9nF95/doWxK5duxAaGooGDRrA29sbUqkUjx8/xp49e/Dff/9hyZIleP/99wv0nLVr1zaNcOh0OsTFxeHGjRtYsmQJlixZguHDh2PRokWws7Pjnbdnz55CvYaiMnv2bHzxxRcoW7asoHHkZPTo0RgwYADKlSsndCiEECtDiQ0hxGp99dVX+ern5OSEP/74A7169cLo0aNx5swZODg4AACOHj2Kv/76CzVr1sTs2bMBAOPGjTN7jmfPnuH27dvo0aMHhg8f/lZxz58/H0+ePMHw4cPx+++/g2EY3vH4+HhERETw2jIyMjBw4EDcuXMHgwYNwpIlS0yvAQD0ej2+++47/Pbbb+jXrx9OnToFT0/Pt4ozu3feeQetW7fmxdSxY0fcvXsXK1euxPTp003Hzp8/j3feeQcikQi//fYbRowYYfY6IyIiMHv2bKSmppp9r6L4OWe3cuVKKBQKs/Zbt26hY8eOmDlzJoYNGwa5XJ7v56xTp06Ov4c3btzA2LFjsWHDBiQkJGDz5s2845UqVSr4CyhCPj4+8PHxETSG3Li7u8Pd3V3oMAghVoimohFCbELLli3x6aefIiIiwnQDnpCQgPHjx0Mul+PPP/8s0A3t27h48SIAYMyYMWY3+wDg4eFhNqq0fPly3LlzB02aNMEff/zBS2oAQCqV4ttvv0WfPn3w8uVLzJs3r/hewP/Z2dlhyJAhAMCbysWyLL744gvo9XrMnz8f7733Xo6vs1KlSli7dq3ZyFRxySmpATKTk+rVqyM1NRUxMTFF8r3q16+PXbt2wd3dHYcOHcLBgwfNvmf2ESmtVovly5cjKCgIfn5+8PHxQe3atTFw4EDTCM+ZM2fg4uICAHj+/DlvKtzYsWNNz/VqqppKpcKUKVNQq1YtuLu7Y/ny5QByXmPzuqioKIwePRpVqlSBj48P2rZtix07dpj1ezUlL7dpZdlfZ48ePbBw4UIAwPjx43nxv4olrzU2p0+fxqBBg1CpUiV4eXmhXr16mDZtGuLi4sz6vnqNZ86cwe7du9G+fXuUKVMGfn5+GDVqFF68eJFjzIQQ60UjNoQQm/H111/j+PHjWL9+Pbp06YJNmzYhJiYG33//PWrXrl1icbi5uQEAHj16hLp16+brnDVr1gAAJk+eDLFYnGu/adOmYffu3di8eTN++OGHXG/mi5pEkvXn5Ny5cwgLC0OZMmUwcuTIN55bUgllbh4+fIiHDx/Czc0Nvr6+Rfa83t7eGDVqFBYvXoytW7eiW7duefb/5JNPsHPnTtSoUQODBw+Gvb09oqKicO3aNezbtw+9e/dGhQoVMG3aNCxcuBBOTk68ZCZ7oqTT6dC7d2+kpKSgc+fOkMlk+Zp6plKp0KVLFzg7O+Pdd9+FSqXCzp078cEHHyAqKgrjx48v3A8EMK2POnfunNl0Q2dn5zzP/eeff/Dll19CqVSiT58+8PHxwcWLF7Fy5Urs378fBw8eRPny5c3O+/vvv3Hw4EF0794dLVu2xJUrV7Bz507cunUL586dE/z3jxBSdCixIYRYrbwW7ec0PUgmk+HPP/9E27Zt8dFHH0Gr1SIoKOitbtQKo3///rhw4QI+/fRTXL16FW3atEG9evXg5eWVY//IyEg8f/4cEokEQUFBeT53zZo14ePjg+joaFy/fh3NmzcvjpcAAFCr1diyZQsA8L7PhQsXAACtW7fmJTwFsX//fjx79izHYz169Mh3QpiT4OBgXL16FTqdDk+fPsXhw4fBMAyWLVuWZ9JYGK1bt8bixYtx5cqVPPslJydj165dqFevHo4dO2b2c3tVoKFixYr46quvsHDhQjg7O+c5HTMmJgY1atTAoUOHzNb45OXOnTvo168f/v77b4hEmRM7Pv/8c7Rp0wZz585Fr169UKFChXw/3+uGDx+OZ8+e4dy5cwWabvjs2TNMmzYNdnZ2OHr0KAICAkzHvv/+eyxevBiTJk3C1q1bzc49fvw4Tp06hRo1apjaPvroI2zfvh379+9H//79C/VaCCGWhxIbQojVejWlJSe53fDVqFED7777LlatWgWJRILly5fnOE2qOH300UeIiorC8uXL8fvvv+P3338HAPj6+qJNmzb44IMP0LhxY1P/V9Oj3NzcoFQq3/j8vr6+iI6ORnR0dJHGvXHjRpw9exYcxyEuLg7BwcF48eIFWrZsiQ8++MAs3rdZmH7gwAEcOHAgx2MVKlR4q8TmyJEj+Ouvv0yPfXx8sGLFCrRr167Qz5mbV+tY4uPj8+wnEonAcRzkcnmOyVVh15x89913BUpqAEAsFmP27NmmpAbInDb40Ucf4eeff8bWrVsxefLkQsVTWFu3boVOp8Mnn3zCS2oAYMqUKdiwYQOCg4Px8uVLs9+7MWPG8JIaABg5ciS2b9+Oa9euUWJDSClCa2wIIVZLpVLl+pWbiIgI00Jug8GA/fv3l1C0WRiGwaxZs3D//n2sXr0aY8eORVBQEBISErBx40Z06tQJixYtMvXnOM50XkG/T1HatGkTFi5ciB9//BH//PMPXrx4gU6dOmH37t28m+fCxvu6ZcuW5Xpt37aowKJFi6BSqfDixQucPHkSrVq1Qv/+/fNMlN/Wm34Wjo6O6N69Oy5duoSWLVti/vz5OHHiBNLS0gr9PeVyeaFKQpcrVw5+fn5m7S1btgQAhIaGFjqmwrp58yYA5DhiKZfL0axZMwA5x5a94h4A05TDvN4rCCHWhxIbQojNMBqNGDNmDNLS0jB//ny4urpizpw5CA8PFyQeFxcX9O/fHwsWLMCePXsQERGBKVOmgOM4zJs3D7du3QKQuVYDyJyOlNs+Na97tSj61XkATJ++syyb63mvjr3+Sf3r9u7dC5VKhfj4eISEhKBr1644cuSI2ejYq1EKS1+cbW9vj/r162PVqlXo2LEjFixYkOv+QYX1atQsPyMuq1evxjfffAODwYAff/wR/fr1Q6VKlTBixIhcF/nnxdPTs1DJZW5TIl9V2UtJSSnwc76tV98zt9he/a7nFJuTk5NZ26tRMaPRWFQhEkIsACU2hBCbsWjRIly6dAmDBw/GuHHjsGTJEqjVaowePRoGg0Ho8KBUKvHNN9+YPn0+deoUAKB8+fIoV64cDAbDG3djv3//PqKjo6FQKNCgQQNT+6ubu6SkpFzPTUxMBPDmRdwSiQQ1atTAunXrULt2bfz11184fPiw6fir9TZnz561mhvH9u3bA8hc1F6UXl2v7FXucqJQKDBlyhRcunQJ9+7dw+rVq9GhQwfs3bsXAwcOhF6vL9D3LuyIWWxsbI7tryqPvZ4ovEqCc7vO2feDKqxX3zO32F5Nf8wpiSGE2A5KbAghNuHq1atYvHgxypUrZ5rm1bdvXwwePBjXr1/nTf0SmqOjI4CsKV0A8N577wEAfvrppzxHXV5Npxo6dCivItqrqm8hISG5nvvqWH4rxEmlUvzwww8AgBkzZphublu2bInq1avj5cuXWLt27RufR6vV5uv7FaeXL18CQKGLHeQkJiYGq1evBgAMHjy4QOeWKVMG/fv3x+bNmxEYGIjw8HCEhYWZjotEojx/D95GZGRkjiNEr5K+19c3vSo9HRkZadb/0aNHOY6gFGa0pF69egCQY2Kv1WpNJdRf9SOE2CZKbAghpV56ejpGjx4NlmWxYsUK3ojEokWLUK5cOfz000+4du1aicTz22+/4d69ezkeu3Dhgunm7dWaBiBzz4+AgABcvHgRY8eORXp6Ou88vV6PuXPnYufOnShbtiy+/vpr3vGhQ4dCLBZj3bp1Oa5DWLNmDe7cuYPKlSsXqJJaq1at0KFDB4SHh2Pjxo0AMm+6f/nlF0ilUnz11VfYsGEDL0l75enTp3j//feLfPpXTlJTU3l77bzu2rVr+PfffyESidCpU6ci+X43btxAv379kJiYiK5du6Jr16559o+Pj8fly5fN2rVarWnU4/VE1d3dHfHx8fmamlhQRqMRc+bM4SVOERERWLVqFaRSKQYNGmRqb9iwIUQiEbZu3cpbD5Seno4pU6bk+PyvpuXllAzlZvDgwZDJZPj777/x4MED3rGff/4ZL1++ROfOnVGmTJl8PychpPShqmiEEKuVV7nnjh07mqb/fPXVV3j06BEmTpyI1q1b8/o5Oztj+fLl6NOnD8aMGYPTp0/nq/LY29i6dStmzZqF6tWro3HjxvDx8UF6ejru37+P06dPg+M4jB07Fg0bNjSdY29vj+3bt+Odd97Bli1bcOzYMXTq1Ally5ZFQkICjh07hufPn8PPzw9btmwxW4tQqVIl/PDDD5g2bRo6dOiALl26oHr16tDr9bhy5QouXLgAJycn/PnnnwUueTxjxgwcO3YMCxcuxODBgyGXy9GiRQts2LABo0ePxvjx47F48WK0bt0anp6eSEtLw61bt3Dp0iWIRCJ88cUXZs+ZV7lnb29vXhW2/EhMTES7du1Qs2ZN1K5dG2XKlEFGRgbCwsJMieS3335rVj3rTW7dumX6PdTr9YiPj8f169dN66OGDRuGn3766Y3P8/LlS3Tq1AnVqlVD/fr14evri/T0dBw/fhyPHj1Cr169ULVqVVP/du3aYevWrRgwYABatGgBuVyO2rVrv3GvnPyoVasWrl69irZt26J9+/ZISkrCzp07kZKSgnnz5qFixYqmvt7e3hg+fDjWrVuH1q1bo3PnztBoNDh27BgqVKiQY6LRpk0biEQi/PHHH0hKSjL9ro4ePTrXaZAVKlTAwoUL8eWXX6Jdu3bo27cvvL29cfHiRZw7dw6+vr75+jkTQko3SmwIIVYrrypWzs7OaNKkCQ4cOIC1a9eiZs2amDFjRo59g4KCMG7cOCxbtgyzZs0q9mlpy5Ytw5EjR3DmzBmcO3cOsbGxYFkWXl5e6NWrF0aMGJHjyIGvry+OHTuGzZs3Y8eOHThy5AhUKhUcHR1Ro0YNjBs3Du+//36uidnHH3+MunXrYuXKlbh48SKCg4MhFotRvnx5jB49GhMmTCjU/iQNGjRAz549sW/fPvz9998YN24cAKBz5864fv06Vq9ejaNHj2Lfvn1ISUmBnZ0dqlSpgs8//xwjRozg3Si/kle559q1axc4sfHw8MC0adNw7tw5nDlzBgkJCRCJRChbtiyGDBmCjz76iFdiO79u376N27dvA8hcI+Xs7Gx6bYMGDUKtWrXy9TwVKlTA119/bfqdiI+Ph7OzMypXrozPPvvMtLHlKz/88ANEIhFOnDiBixcvwmg0YtiwYUWS2Li4uGD79u2YPXs21q1bh7S0NNSoUQOffvopBgwYYNb/559/hpeXF7Zs2YLVq1fD29sbgwYNwtSpUxEYGGjWv2rVqvj777/x66+/Yv369aZRp8GDB+e5vmvUqFGoXLkyli5div379yM9PR1lypTB6NGjMXny5FwLCxBCbAejUqnM5wcQQgghhBBCiBWhNTaEEEIIIYQQq0eJDSGEEEIIIcTq0RobQgh5S/v27TMtFs+Ls7Ozaf0JeXt5FY94XatWrcyKRhBCCCl9KLEhhJC3tH//fmzatOmN/cqXL0+JTRHKq3hEdpTYEEJI6UfFAwghhBBCCCFWj9bYEEIIIYQQQqweJTak2IWHhwsdAhEQXX/bRtffttH1t210/W2bENefEhtCCCGEEEKI1aPEhhBCCCGEEGL1KLEhhBBCCCGEWD1KbAghhBBCCCFWj/axIYQQQgghRcZgMCA9PR0KhQLJyclCh0ME8jbX397eHhJJwdMUSmwIIYQQQkiRMBgMSE1NhYuLC+RyORQKhdAhEYEU9vpzHAeVSgVHR8cCJzc0FY0QQgghhBSJ9PR0uLi4gGEYoUMhVophGLi4uCA9Pb3A51JiQwghhBBCigwlNeRtFfZ3iBIbQgghhBBCiNWjxIYQQgghhBBi9SixIYQQQgghhFg9SmwIIYQQQgghVo/KPRNCCCHEIj1Q6bHriRo3EvSIVRsRo2aRrGXhJBPBSymCp1KMak4SNPCQoqGHDH6OYlq4TgrMxcUlz+PDhg3DihUrAABDhw5FcHAw/vvvP7Rr147Xb8GCBVi4cCGAzMXv3t7eaN26NWbPno1y5crx+j558gQ//fQTTpw4gdjYWLi5uaFq1ap45513MHDgQMhksjxj+/nnn3H58mVs2rQpz9hVKlWex0vK8+fPMXnyZJw5cwYKhQIDBw7E999/b3qdRYUSG0IIIYRYjDQ9i9X307H1sRq3E/U59knRGxGZbgSgx+HX2l3lDBq4y9DAQ4oGHjI09JChrL24ROIm1issLMz0/4cPH8ann37Ka3u1F0t0dDROnz6NcePGYe3atWaJDQBUq1YN+/btA8uyiIiIwOTJkzFq1CgcOXLE1Of69evo06cPqlevjh9//BHVq1dHRkYGHjx4gDVr1qBy5cpo1qyZqf9vv/2GLl268L6Pk5MTBgwYgDlz5pjaGjRogJkzZ6J///5v/TMpSkajEUOGDIGrqysOHDiApKQkjB07FhzHYdGiRUX6vSixIYQQQojgOI7D9sdqzLqSjKgMtlDPkaTlcPylFsdfak1tPkoR6nvI0PD/ozoNPKRwV1CyU5Jc/nlRot9PNcq3QP29vb1N/+/s7GzW9srGjRvRoUMHjBkzBk2aNEFiYiLc3Nx4fSQSiencMmXKYOTIkZg2bRpSUlLg5OQEjuMwduxYVK5cGcHBwRCJslaF1K1bFwMHDgTHcbzndHZ2zjEepVJpihfIHCVycnLKsa+Qjh8/jnv37uHWrVumkau5c+fi008/xcyZM+Hk5FRk34sSG0IIIYQI6r5Kj8/PqRASqyvy545Wszj0XINDzzWmtrJ2ItR0laKWqxQ13aSo6SpFdWcJ5GKaxkZyxnEc1q9fj7lz56J8+fJo1KgRNm/ejHHjxuV6TkxMDPbu3QuxWAyxODOZDg0Nxf379/H333/zkprXlcR0Sl/fvJO/5s2bY/v27QCA8+fPY9CgQXn2//LLLzFp0qQcj126dAn+/v686XgdOnSAVqvFjRs3EBQUVMDoc0eJDSGEEEIEs/lhBr68oEKGgTM7JmKA1j5y9PVTwt9FAm+lGK5yBiodhzi1ES/SjbiZoMe1eB1uJOiRqjd/jpy8zGDxMkOLoy+yRnbEDFDNWYKarpmJTj13KVp4y2AvpTpLBDhz5gySkpJMU8KGDh2KFStWmCU2YWFh8PX1BcuyUKvVAIAxY8bA3t4eAPDo0SMAQNWqVU3nJCcno2bNmqbH2ZOEMWPGmH2f4OBg1KpV661eT15eTb8DMqe4vam/q6trrsdiY2Ph6enJa3N3d4dYLEZsbGw+os0/wRKbn3/+GXv37sXDhw8hk8nQuHFjzJ49m3dhc3Lnzh1MmTIF165dg6urK95//31MnTqVl92ePXsW33zzDe7fvw8fHx989tln+OCDD4r7JRFCCCEkn9QGDtMuqrD2QYbZMZkIGFfLAeNqOcBLaT5tzE0BVHbKvIXpXzmzjeU4PEw24Fp8ZqJzPV6HW4l6aIz5i8fIAfdVBtxXGbAjQm2Ko4WPHJ3KKTCwkhLedjSFzVatX78e/fr1My1279OnD6ZOnYorV66gcePGpn6VKlXCtm3boNVqceDAAezZswezZs3K87kdHR1NicOgQYOg0/FHLr/99lt07NiR15a9GEFBVa5cOd99lUplgfrnJLdRqKIenRIssTl79iw+/PBDNGzYEBzHYf78+ejbty8uXryYa9aXkpKCfv36oUWLFjh+/DjCw8Mxfvx42NnZYeLEiQAyq0wMHjwYw4cPx59//omQkBBMmjQJ7u7u6NOnT0m+REIIIYTk4HmaAcOPJSI0h+IAXcrJMT/QBVWcC3aLImIYVHeRorqLFEOr2gEA9CyHe0l6XDclO3rcTdIjh8GhHOlY4ORLLU6+1GLOlWT0q6TEmAAHNPIs2kpOpV1B17xYGpVKhT179kCn02HNmjWmdqPRiLVr1/ISG5lMZkoCAgIC8OjRI0yePNlUVa1KlSoAgPDwcNSrVw8AIBKJTOfkVCXM29v7rROL7EpyKpqXlxcuXrzIa0tISIDRaDQbyXlbgiU2O3bs4D1euXIlKlSogJCQEHTr1i3Hc7Zt2wa1Wo0VK1ZAqVSiZs2aePDgAZYvX44JEyaAYRj8888/8PHxMVVZ8Pf3x5UrV/D7779TYkMIIYQI7Hy0Fu+dSES8hl8gQCEGFjVzwYjq9kX2vaQiBnXdZajrLsNI/8zn1Rk5hCcbcDdJb/q6k2T4f5W13OlZYOsjNbY+UiOojBwLmzojwFVaZLESy7Vt2zZ4eHhg69atvPbLly9jxowZWLBggWmqWXZTp05F48aNMWbMGNSvXx9169aFv78/fv31V/Tr18+09qakleRUtMDAQCxevBgvXrwwJVQnTpyAXC5H/fr18x90PljMGpu0tDSwLJtnLfFLly6hefPmUCqVprYOHTpg3rx5ePr0Kfz8/HDp0iW0b9+ed16HDh2wadMm6PV6SKX0JkQIIYSUNI7j8E9YBqaGqMxGTKo4ibGmnTtquxX/32iZmEEtNylqZfteKi2Le6rMROdWgh4no7R4kppzsnM6SovWu2MxpqYDptV3hJOM1uGUZuvWrUPv3r3NlktUrVoVM2fOxI4dOzBixIgcz/Xz80O3bt0wb948bNu2DQzDYPny5ejbty86deqESZMmwd/fH0ajERcvXsSLFy/Mkp3k5GTExMTw2uzt7eHg4FDo11SSU9Hat2+PgIAAfPLJJ/j++++RlJSEWbNm4b333ivSimiABSU206dPR506dRAYGJhrn9jYWJQtW5bX9moIKzY2Fn5+foiNjUXbtm3N+hgMBiQkJMDHxyfH5w4PD3+7F0DyRD9f20bX37bR9bdt4eHh0BiBHx7JsD/W/LajrZsBs6tnQJ6QivAEAQJ8jQeAIDEQ5AWM8wSeaxicThBjZ7QEzzT85MXAAcvupGFbeAq+99ehgXPhSlSXNgqFAnK53PRYo9Hk0dvyvFrf8iru0NBQhIaGYt68eTm+ls6dO2PNmjUYNGgQDAYDWJY16zd69Gj06tULZ86cQZMmTVCrVi0EBwfjt99+w9SpUxEbGwuFQoGaNWti+vTpGD58OO85Pv30U7Pv+/nnn2P69Om8No7joNfrLepn/iqWtWvXYvr06ejatSsUCgX69euHGTNm5BlrSkpKjsUFqlWrlus5jEqlyudM0+Lz9ddfY8eOHTh06BD8/Pxy7devXz/4+vri999/N7U9e/YMdevWxZEjR9CkSRM0atQIQ4YMwdSpU019zp49i549eyIsLMzianvbgvDw8Dx/CUnpRtffttH1t23h4eFgvPzw3olE3E0ymB2fVt8R0+o7QlQC5W3fBstxOPZCi6W303A6Smt2XMwAsxo5YWJtB4t/LcUtOTnZtLeKRqPhTWkituVtr//rv0v5JfjY6VdffYX//vsPe/bsyTOpATIXH2XP3OLj4wFkjdzk1kcikZhtokQIIYSQ4sFxHPbEiNFub5xZUmMvYbC2nRu+auBkFYmAiGHQqZwCu7u4Y207N5Sz508VMnLA7CspGHYsEck6GrkhRCiCJjbTpk3D9u3bsWfPHlSvXv2N/QMDA3HhwgXesNWJEydQpkwZVKxY0dTn5MmTvPNOnDiBBg0a0PoaQgghpATEqo0YdiwR34XLzfaW8XeW4FgvT/T2U+ZytuViGAa9/ZS41N8Ln9Y2X99w+LkG3Q/EISojnzWmCSFFSrDEZvLkydi4cSNWrVoFFxcXxMTEICYmBmlpaaY+c+fORe/evU2PBw4cCKVSiXHjxuHu3bvYs2cPfvnlF4wbN85UB3vUqFF4+fIlpk+fjrCwMKxduxYbN27EhAkTSvw1EkIIIbZm9xM1mu+MxaHn5nPnB1RS4lgvT9Rwse4PGu0kInzbxBlbO7rDVc4fcbqTZEDn/XEITzYvZU0IKV6CJTarVq1Camoq+vTpA39/f9PX0qVLTX2io6MRERFheuzs7IydO3ciKioK7dq1w5QpUzB+/Hhe0uLn54etW7fi/PnzaN26NRYvXoyFCxdSqWdCCCGkGKm0LEafTsTIE4lI0PKnYynFDH5q7oxVbVzhIBV8FnyR6VxegdO9vdDIg5+oPU8zosv+eFyL0+VyJiGkOFhE8QBSutHiYdtG19+20fW3DWeitPjkdBJe5DAFq4mnFH+0divwhpvWJMPA4oOTSWajVM4yBnu6eqCeu+1s6EnFA8grNlk8gBBCCCHWychyWHgjBX0Ox5slNVIRMK6iDge7e5bqpAbInJq2vr0bRlSz47Un6zj0O5yAu0k0LY2QkkCJDSGEEEIKLCbDiH7BCVhwPRVstrkfNV0kONbTE6PKGyARWX7Vs6IgETH4raULvqjDLyqQqGXR93A8HtKaG0KKHSU2hBBCCCmQO4l6tN8bZ7anCwPgs9oOONHbC3VtaPrVKwzDmPazeV2smkXfwwmIVVO1NEKKEyU2hBBCCMm3Uy816HYgzmzqmYdChP86u2NuE2fIxbYxSpMThmHwbWMnfBxgz2uPTDdi+LEEqA20tJmQ4kKJDSGEEELyZfPDDAwITkBKtr1pWvrIcKaPF9r70kJxIDO5WdjU2WzNzeU4PSaeSwLHUXJDSHGgxIYQQgghb/TXvTR8ciYJ2QccPqphj91dPFDGTixMYBZKxDBY0sIF7cvKee3bH6ux8EaqQFGRnLi4uOT5NXbsWFPfoUOHws3NDSdOnDB7ngULFpjOcXV1RY0aNfDxxx8jMjLSrO+TJ08wceJE1K5dG15eXqhRowZ69uyJjRs3QqfLKhOeW0yrV6/G2LFj3xi7pZg2bRratm0Lb29v1KlTp9i+T+kuU0IIIYSQt7bsThq+uZRs1v5dYydMqO1g2iSb8ElEDFa3dUOX/XEISzaY2n+4kYpGnjJ0KkcjXJYgLCzM9P+HDx/Gp59+ymt7VbI4Ojoap0+fxrhx47B27Vq0a9fO7LmqVauGffv2gWVZREREYPLkyRg1ahSOHDli6nP9+nX06dMH1atXx48//ojq1asjIyMDDx48wJo1a1C5cmU0a9bM1P+3335Dly5deN/HyckJAwYMwJw5c0xtDRo0wMyZM9G/f/+3/pkUNZZlMWzYMNy9exfHjx8vtu9DiQ0hhBBCcrUkNBVzr6bw2mQi4I/Wruhf2S6Xs8grLnIRNnd0R4d9cUh8bePST04n4WxfL5sY6XIY2bZEv1/ampMF6u/t7W36/1f7prze9srGjRvRoUMHjBkzBk2aNEFiYiLc3Nx4fSQSiencMmXKYOTIkZg2bRpSUlLg5OQEjuMwduxYVK5cGcHBwRCJsiZP1a1bFwMHDjSbqujs7JxjPEqlkrfPC8MwcHJyyrGv0BYtWgQAWLp0abEmNjQVjRBCCCE5WngjxSypkYuBjR3cKakpgEpOEqxr74bXK18naFmMPpUIY/Za2cQicRyH9evXY/DgwShfvjwaNWqEzZs353lOTEwM9u7dC7FYDLE4M4ENDQ3F/fv3MXHiRF5S87qSGAH19fXN82vgwIGmvufPn39j/59++qnYY84PGrEhhBBCCA/HcZh3LRWLQ/lrQZRiBps7uqFNWZpCVVAtfeT4qr4j5l3P+pmeidbh59BUTKnvJGBkJD/OnDmDpKQk05SwoUOHYsWKFRg3bhyvX1hYGHx9fcGyLNRqNQBgzJgxsLfPrJL36NEjAEDVqlVN5yQnJ6NmzZqmx19++SUmTZpkejxmzBiz7xMcHIxatWq91evJy6vpd0DmFLc39Xd1dS10LEWJEhtCCCGEmHAch1lXUrD0dhqv3V7CYEsnd7TykedyJnmTL+s64ky0jrf/z4IbqWhTVo5AL/q5WrL169ejX79+kMky92fq06cPpk6diitXrqBx48amfpUqVcK2bdug1Wpx4MAB7NmzB7NmzcrzuR0dHU2Jw6BBg3jFAwDg22+/RceOHXlt5cqVe6vXU7ly5Xz3VSqVBeovJEpsCCGEEAIA0Bo5fH5ehU0PM3jtTlIG2zq5o6k33Xy/DbGIwcogV7TeHYt4TeZ6G5YDJp5V4XQfr1K7/09B17xYGpVKhT179kCn02HNmjWmdqPRiLVr1/ISG5lMZkoCAgIC8OjRI0yePBkrVqwAAFSpUgUAEB4ejnr16gEARCKR6ZxXidPrvL29izyx8PX1zfN48+bNsX37dgCZU9EGDRqUZ//so0xCocSGEEIIIYjXGPHusUSExPI/LXaWMdjZ2QMNPc1vuEjBlbETY0VrVww6kmBqC0s2YNGNVMxoRFPSLNG2bdvg4eGBrVu38tovX76MGTNmYMGCBaapZtlNnToVjRs3xpgxY1C/fn3UrVsX/v7++PXXX9GvXz/T2puSRlPRCCGEEFIqXY/X4f0TiXiaZuS1u8lF2NnFHfXcKakpSp3KKfBuNTusD88aGVtyKxW9/RSoSz9ri7Nu3Tr07t2btw4GyFwnM3PmTOzYsQMjRozI8Vw/Pz9069YN8+bNw7Zt28AwDJYvX46+ffuiU6dOmDRpEvz9/WE0GnHx4kW8ePHCLNlJTk5GTEwMr83e3h4ODg6Ffk0lPRXt8ePHSEtLQ1RUFPR6PUJDQwEANWrUyHGUqrCoKhohhBBio9L1LGZcSkaHfXFmSU0VJzGCe3hQUlNMvm/iDB9l1m2YkQMmnFVBT1XSLMqNGzcQGhqKPn36mB2TyWTo1q0b1q1bl+dzTJgwAUeOHMHFixcBAI0aNcKpU6dQs2ZNTJs2Dc2bN0enTp2wadMmzJw5E5999hnv/E8//RT+/v68r19++aXIXmNJmDhxIoKCgrB8+XJER0cjKCgIQUFBiIqKKtLvw6hUKvoXRIpVeHg4qlWrJnQYRCB0/W0bXX/LZGQ57H2qwewryWYJDQAElZFjTTs3uMrf7vNPuv552/9UjeHHE3ltcxs74bM6jgJF9PaSk5NNe6toNBrelCZiW972+r/+u5RfNGJDCCGE2AidkcP68HQ03RmL90+aTz0DgFH+dvivs/tbJzXkzXpUVKJ/JSWv7ccbqXiRbn5dCCFvRmtsCCGEkFIuKsOIf8PS8W9YOmLUbI59ytmL8XNzF3QuT5+wl6QfmznjxEsNkrSZE2jSDRxmXk7G6rZubziTEJIdfRxDCCGElFJhKj1Gn0pEna3RWHgjNcekhgEwOsAeF/p5UVIjAA+FGDMb8qfb7IhQ8/a6IYTkD43YEEIIIaXM3SQ9Ft9Mxc4INXJbSCtmgMFV7PB5HQf4u0hLND7CN7K6HdY8SMfNBL2pbWqICmf6eEEqKp172xBSHCixIYQQQkqJVD2LeddS8Oe9dORWXMtBwmBYNTtMqOWAio50G2AJxCIGi5u5oNP+OFPbfZUBf9xNw8Ta1ltIgJCSRu9ohBBCSCmw76kaU0NUeJmR8xqaas4SjA6wx5AqdnCS0Ux0S9PES2a2t82iG6kYWsUOnkphNnEsLI7jwDA00kQKj+MKV7SZ3tkIIYQQK5aqZzHmdCLePZ6YY1JT202KNe3ccLGfFz4OcKCkxoLNaewEZ1lWQpCi5/DDjVQBIyo4e3t7qFSqQt+YEsJxHFQqFezt7Qt8Lo3YEEIIIVbqRrwOH55KxKMU8/LAFRzEmBfojJ4VFPTpuZXwUIgxtb4TvrmUbGr7JywdH9WwR4CrdayDkkgkcHR0REpKClJSUuDk5CR0SEQgb3P9HR0dIZEUPE2hxIYQQgixMhzH4Y+76Zh9JRm6bIM0EgaYUNsBU+s7wk5CozPW5uMa9vj7Xhoep2YmqywHzLycjO2dPQSOLP8kEgmcnZ0RGxuL8uXLCx0OEYgQ15/e8QghhBArkqAxYtixRHx1yTypCXCR4GRvL8xp7ExJjZWSiRl824Rf/vnoCy2ORmoEiogQ60HveoQQQoiVOButRevdsTj03Pwmd5S/HY718kRtN+uYskRy16OCAi19ZLy2GZeTYcyt1B0hBAAlNoQQQojFM7IcFlxPQe9D8WYFApxkDP5t64YlLVxplKaUYBgG85o44/WVUfdVBmx6lJHrOYQQSmwIIYQQi/Yi3Yheh+Kx8Eaq2d40jT2lON3bC30rKYUJjhSb+h4yDKnCv64LrqVCbaBRG0JyQ4kNIYQQYqEOPlOj1e4YnI/R8doZAF/UccDB7p7wo002S62vGzrh9ercLzKMWHUvTbiACLFwgiY2586dw9ChQxEQEAAXFxds2LAhz/4LFiyAi4tLjl9xcZm79Z45cybH4w8ePCiJl0QIIYS8Na2Rw/SLKgw7logkLf8Tei+lCDs6u2N2Y2dIRVTGuTSr4CDBxwEOvLafQlOh0ua8CSshtk7Qj3nS09NRs2ZNDBs2DJ988skb+0+cOBEffPABr+2DDz4AwzDw9PTktYeEhMDV1dX02MPDesokEkIIsV0Pk/X44GQSQhP1Zsfal5XjjyBXeFnZTvSk8CbVdcC6B+lI0WcmuCodh19vpWJ2Y+c3nEmI7RE0sencuTM6d+4MABg3btwb+zs4OMDBIeuTi8jISFy4cAErV6406+vp6Ql3d/eiC5YQQggpZpseZmDyBRXSs62jkDDAzEZOmFjbASLabNOmuCnE+KyOI767lmJq++NuOsbUdICPHSW4hLzOqtfYrFu3Ds7Ozujdu7fZsbZt28Lf3x+9e/fG6dOnBYiOEEIIyZ90PYsxpxMx9kySWVJTwUGMQz088VkdR0pqbNQnNe3hrcy6ZVMbOfxyK1XAiAixTIxKpbKI8hq+vr748ccfMXz48Hz1Z1kWdevWRa9evbBgwQJTe3h4OM6cOYOGDRtCp9Nhy5YtWL16Nfbt24eWLVvm+nzh4eFv/RoIIYSQgorXAV/eleNemvmn7x09DPi6qg5UH4Bsi5Lgx0dZe9vIGA67GmvgKbeI2zhCSky1atVyPWa1b5XBwcGIjIzEe++9x2uvVq0a7wUHBgbi2bNnWLp0aZ6JTV4/JPJ2wsPD6edrw+j62za6/nm7r9Jj9JEEPE8z8tqVYgYLmzljRDU7MFY8SkPXv+hMqsxhY3QMItMzf1d0HINdaR74sbaLsIHlga6/bRPi+lvtVLQ1a9agadOmCAgIeGPfRo0a4fHjxyUQFSGEEJI/56O16Lw/ziypCXCR4HgvT7xX3d6qkxpStORiBl/WdeS1rXmQjpfpxlzOIMT2WGViExUVheDgYLPRmtzcunUL3t7exRwVIYQQkj9X43QYfCQBKTr+NKJOvnIE9/REgKtUoMiIJRtezQ7l7LOmLGqNwBJaa0OIiaCJTVpaGkJDQxEaGgqWZREZGYnQ0FA8f/4cADB37twcCwOsX78e9vb26Nevn9mx5cuXY9++fXj06BHu3buHuXPnYv/+/fj444+L/fUQQgghb3InUY8BwfFIy1YkYJS/HTZ1dIej1Co/cyQlQC5mMCn7qE1YOl7QqA0hAARObK5fv46goCAEBQVBrVZjwYIFCAoKwvz58wEA0dHRiIiI4J3DcRzWrVuHQYMGwc7Ozuw59Xo9Zs6ciZYtW6Jbt24ICQnB1q1bc0yQCCGEkJL0OMWAfsHxUGUbqfmmgSN+bu4CCW24Sd5geDU7lHfIGrXRscCKO2kCRkSI5bCYqmik9KLFg7aNrr9to+ufJVFjRLu9cXiabU3N5LqOmNHISaCoihdd/+Lxb1g6Pj+vMj12lDK4NcgHLnLLGu2j62/bqHgAIYQQUgrpWQ4jTySaJTWjA+zxTUPHXM4iJGdDq9jB67V9bVL1HP4NSxcwIkIsAyU2hBBCSDGbcSkZZ6J1vLYhVZT4oakzVT4jBaaQMBgd4MBr++NuGrRGmoRDbBslNoQQQkgxWvcgHSvv8T9Nb+4tw9KWrhBRUkMK6cMa9rCXZP3+RKtZbHucIWBEhAiPEhtCCCGkmFyL02HSBRWvrZy9GGvbuUEmpqSGFJ6rXIQR1flFlJbeSgPL0agNsV2U2BBCCCHFIEFjxHsnEqFjs9qUYgbr27vBUynO/URC8mlcLQe8nh+HJRsQHKkRLiBCBEaJDSGEEFLEjCyHj08lITLb/iJLW7mgvodMoKhIaVPBQYJ+lZS8tj/vUhEBYrsosSGEEEKK2MKbqTj+UstrG1vTHgMrm++/RsjbmFCLX0Tg+EstHiUbBIqGEGFRYkMIIYQUoYPP1PjxRiqvrZmXDN82cRYoIlKa1feQobGnlNf2D5V+JjaKEhtCCCGkiNxL0uPjU0m8Nk+FCKvbukEqomIBpHh84G/Pe7zhYTrUBioiQGwPJTaEEEJIEUjUGDHsWALSXruhFDPA323dUNaeigWQ4tOvkh1c5VmJc5KWw84IKv1MbA8lNoQQQshb0rMc3j+ZhCep/GIB8wOdEVRGLlBUxFYoJQyGV+WP2qym6WjEBlFiQwghhLwFluPw6TkVTkfxiwW8V90OowPsczmLkKL1QQ3+79qVOD1uxOsEioYQYUiEDoAQQkoNnRZMciIYdQagVYPRagCDHhCLAbEEnEQC2DmCdXYF7J0AEX22ZO04jsOUkGRsesif9tPcW4bFzVzAMLSuhpSMyk4StC8r51XjWx2Wjt+ovDixIZTYEEJIful1YOKiIIp9AVH0CzCxLyCKeQFRQjQYVSIYdf6nfnBiMThnd7BlK4ItWyHzv37VwZavCkjordkacByHWVdS8Pd9/nUv7yDG2nZukIkpqSEl68Ma9rzEZmeEGgsCnWEvpQ9RiG2gv56EEPI6nRai2JdgYiIzk5bYF2Be/TchFgxXNJWGGKMRTGIsRImxwO3LpnZOJgfr5w9j9Tow1GkCtmptSnQskJHlMPNKMpbf4Sc1PkoRdnfxgKeSigWQktelvALeShFi1CwAIFXPYd8zDYZUof2TiG2gv5aEENujyfh/8pKZsIhispIXUWKcoKExOi3ED0IhfhAK2b4N4BR2MNZqBEPdpjDWCQTn7iVofARI1bP46FQSDj/X8Nrd5SLs6uqByk70p5UIQyJiMKSKHX67nWZq2xieQYkNsRn07ksIKb20GohePoHoeQREkY8hioyA6EUERKqEYvl2nFgMztEVnL0DIFeCUygBsQRgWcBoAGPQg0lNBpOSlO9pa4wmA5KrZyC5egYAYPT1g7Fu08yv6nUAifQNz0CK0rM0A4YdTcCdJP7O7s4yBju6uKOGC10PIqxhVfmJzekoLZ6lGVDBgW75SOlHv+WEkNLBoIfoaTjE4bchDr8N0fNHYGJfFtnUsVdYNy+wPuXAefmC9fYF6+ULzqsMWFePghUEeDXlLeopRC+eQvz8EUQP77wx6RK/eALxiyfAwS3g5AoYazaCoW4gjHWbgvPwefsXSHLEcRzWhWfgm0vJSNXzf6fK2ImwqYM76rnTIm0ivABXKRp6SHEtXg8A4ABseZiBKfWdhA2MkBJAiQ0hxGox0ZGQhIZAfPMixA9Cwei0bz7pDThGBM7DOzNh8c5KXlhvX3CeZQBZEe1JIpODLVcJKFcJxiaAHgA4DkxCDMQPbkF8+wrEty5BlJKU61MwWg0k189Bcv0cAIAtWxGGWo3AVqsNY7Xa4Nxo2lpRiEwz4PPzKhx9Yf77Vc9dik0d3GkDTmJR3qlqh2vxyabHmx5mYHI9R6rSR0o9SmwIIVaFiX4O6fmjkFw8DlH080I9BycSgfMsk5WwvEpefMpljnoINb2LYcB5+MDg4QNDi04Ay2aOQt26BEnoRYge3gXDsbmeLnr5FLKXT4EjOwAArKsH2HKVwfr6gfWtBM7TB6ybJzhXz8InaKwRMBgyy1izbOZUO7E4s8CBqHTd3KfpWfx2Ow1Lb6VBbTQf+etZQYGVQa5UcYpYnAGV7fD1pWTo/v928TjViIuxOjTzps1iSelGiQ0hxPJp1ZCcC4b09EGII+7n+zSOYcB5lQXrWwls+cowlqsMtlwlcF6+1lFpTCQCW8kfbCV/6HuPANJTIfn/SI741qU3TlsTJcVDlBQP3LpkdoxTKMHJlYBCCU6mADgWjNGYmayw//+v0QiwRjAGnSmZYdjcEytOoQRn7wTOwQmckws4jzLwEskgVr3M/Ll7l7OKvXv0LIdNDzMw/1oKotXmr9dOwmBOIyd8FGAPEX0CTiyQq1yEbhUU2P0kq8DFxocZlNiQUs8K/rITQmwVE/sS0mO7ID29H0zGmxfbsy4eMFavA7Z6HRgrB4At5wfIlcUfaEmxd4ShaTsYmrYDOA6iZw8hDn01mnM7z6QjO0ajBqNRA8lv7lvg50yIMbX5AsCx7QAATq4AW74KjJX8YfSvB6N/PcDJpegCeEs6Y2ZC81NoKp6lGXPs09xbhuWtXFGJKp8RC/dOVXteYrMzQo2FTV2glFAyTkovemcmhFgcJuoZZHvWQXLhWJ5TrzixBEb/ujDWaw5D3UBwZSoAtvIJOsOArVgNbMVq0PcaDmSkQXz/BsThdyB+eBuiiPtg9Hqho+RhtBqIH96B+OEd03Q5Y1k/GOs0gbFeMxj96woyDVBn5LDx/wnN81wSGg+FCN80cMJIfzsapSFWoYOvHF5KEWJf29MmOFKDPn6l6MMeQrKhxIYQYjGY6EjIdq/JM6HhGAZG/3owNO8IQ5M2gL1jCUdpoewcYGzYCsaGrTIfG/QQRT//f4nrJxBFPQOTGAcmKR5MckLmtLNC4qRSQCLLTCKNBsBoBGMoXBIlfvkE4pdPgMPbwCnsYKgTCEOzDjDWDSy6Qg250Bo5bAjPwM+hqYhMz/nnIRcD42s54PM6jnCSWf40OkJekYgY9PNTYuW9rNHu7Y8zKLEhpRolNoQQ4aWlQLZ7LaTHduZ6w806usDQtif07XrTJpX5IZFmFg4oV9n8GGsEtJrMaWOajMxqcowInFicuQZGJM4sCPD//+ekssw1SRJp5rGcRixYFlCng0lLyfxKioMoLgop4ffgqk6B+NlDMGkpeYbMaDIgvXwS0ssnwSntYWgclJnkBNTPLFJQRLRGDuvD07EkNC3XhEYmAkZUt8fndRxQnvb/IFZqUBU7XmITHKlBso6FMyXppJSid2tCiHAMBkiP74Js1xow6ak5djGW9YO++xAYmrYv9k/wbYZIDCjtwSntAWTuc/H2zykC7B3B2TuC8/YFEAAjgMiq4VBWq5ZZyjoxDqInYZnlrO/fgOjpw1xH5hh1OqRnDkJ65iBYJ9fMtUXNO4GtXKPQ0w0NbOYIzY83UvEiI/eE5r3/JzTlKKEhVq6RhxR+jmI8Sc38fdcagX1P1RhezV7gyAgpHvSuTQgRhCj8NuT//gxx5OMcjxt9/aDvMzJzupkVVNIib8Aw4Ny9YHT3grFR68y2jDSI71yF5GYIxKEhECXnvGePKCUJsiM7IDuyA8aK1aBv3weG5h3yXRiC4zjsearB99dSEJ5syLGPXPwqoXGEL+1JQ0oJhmEwsJIdFodmfXC0/TElNqT0osSGEFKy0pIh3/onpKf253iY9fCGbuDozMpflNCUbnYOMDZpA2OTNpl79jy+B0nIMUgunsh1Y1Lx03CI/1kMbssK6Ft2yZya6OuX67e4l6THlxdUuBCjy/G4XAyM/H9CQ5tsktJoYBUlL7E5FaVFTIYR3nb0+05KH0psCCElRnz5JORrf83xppVT2EHX+13oOw2gKWe2SCQCW7UWdFVrQTdsHMT3bmQmOVdP51jqm8lIzxrFqVEPuo79MkeC/r9JaIaBxeKbqfjtVhoMOcy1k4uBUf72+KyOI8rQDR4pxWq4SFHbTYrbiZkFPlgO2PVEjTE1HQSOjJCiR4kNIaTYMcmJkK/9BZIrp3M8rm/RGbqhn4BzdivhyIhFEktgrN0YxtqNoX3vc4hDL0F65gDEN0PAcOZZivj+TSjv3wTrVRa6bkNwPaA93j+fjkcp5utoRAwwvKodpjdwoilnxGYMqqw0JTZAZnU0SmxIaSToPI9z585h6NChCAgIgIuLCzZs2JBn/6dPn8LFxcXs6+jRo7x+Z8+eRZs2beDt7Y169eph9erVxfkyCCG54Ti43gqB3Vfv55jUsGUqQD19CbRjvqakhuRMJoexcWtovliAjEUboes5HKyTa45dRbEvoVizBJVnDkfvO3ugNGp5xzv5ynGhrxeWtnKlpIbYlP6V+OvRLsfp8SQ15/VmhFgzQROb9PR01KxZEz/88AOUyvzXVf/vv/8QFhZm+goKCjIde/LkCQYPHozAwECcPn0aX375JaZOnYrdu3cXx0sghOSCSYyFYslX8Nv9N5h0fplfTiyBtu/7yPj+bxgDGggUIbE2nGcZ6AZ9jIwlW6EZOxNG/3o59vPWp2DRo40ID/kCnz0/iEpyA9a0c8PWTu7wdyn5DUAJEVp5Bwmae8t4bbufqAWKhpDiI+hUtM6dO6Nz584AgHHjxuX7PDc3N3h7e+d47J9//oGPjw8WLVoEAPD398eVK1fw+++/o0+fPm8fNCEkbywLyal9kG9ZCUZtvjbCWKkGtB9NzXl/FULyQyKFoVkHGJp1gCjyMYz7t0Jx4QikHH/qmY8+GT89Wo8fEw5BX/5DGCp0BhgaqSG2qa+fkldEY9cTNT6rQxsck9LFKksOjRgxAlWrVkWXLl3MRmIuXbqE9u3b89o6dOiA69evQ68v3M7YhJD8YV4+hXLBZ1D8+7NZUsNJZdAO+QTqmb9TUkOKzEOnCmjo/j6qNPsFi8v3QKpYYdZHrIqHYtVCKGePhvj2FQGiJER4vf2UeH0HqOvxNB2NlD5WVTzAwcEB3333HZo1awaJRIIDBw5g1KhRWLFiBYYMGQIAiI2NRdu2bXnneXp6wmAwICEhAT4+Pjk+d3h4eHGHb9Po51u6MUYDvM4fgs/Z/RAZzf9QppWvimc9R0Lr7gM8jhAgQiKk4vr3H5bGYOIdBZL0DCB3w/Qq72BR+V74LnofPnh+BBI9f42N+NkjKBdNhsq/ASI7D4He2b1Y4iJ89P5vOeo7yXE9JWvU8u8rz/FeueJNbuj627biuP7VqlXL9ZhVJTbu7u6YOHGi6XGDBg2QmJiIX3/91ZTYAJkbUr2O+38Vneztr8vrh0TeTnh4OP18SzHRwzuQr10E8YsnZsc4uQKRbfvCdehoVKA9aWxScf37v5ukx6eX45GkZ3ntdfw80HPUl9BoPoLswCZIj+wAY+CP1ruEXYdzxD3o+r4HfZdBgITW3RQXev+3LMP0abh+Mdn0+EyqPb6r5lVs34+uv20T4vpb/Z1Go0aN8Phx1s7lXl5eiI2N5fWJj4+HRCKBmxtVXSKkyGSkQbb+Nyi/n5BjUmOo1wwZC9YgvkkH2miTFKmHyXr0PRyPRC0/qRlcRYmtHd3hJBMBTi7QDR2LjB/WQt+8o9lzMDoN5Fv/hHL2aIgi7pdU6IQIKvt0tBsJNB2NlC5Wf7dx69YtXiGBwMBAnDx5ktfnxIkTaNCgAaRS+lSOkLfGGiE5uQ92U9+F7MgOs31FWEcXaMbOhOaLBeDccy7yQUhhPUk1oPeheMSq+UnN2Jr2WNnaFTJxthF7zzLQfjIDGbP/gLFygNnziSMjoJw7DrKtKwGd1uw4IaWJj53YrDrargiqjkZKD0ETm7S0NISGhiI0NBQsyyIyMhKhoaF4/vw5AGDu3Lno3bu3qf/GjRuxbds2hIWFITw8HEuXLsWqVaswevRoU59Ro0bh5cuXmD59OsLCwrB27Vps3LgREyZMKPHXR0hpI753HcrZY6D4ZzFEqSqz4/pWXZHxwxoYmnUA8pj6SUhhRKZlJjUvM/hJzcc17DE/0DnP6cZs5RpQz1wGzajJ4OydeMcYjoVs/ybYzfoYoicPiiV2QixFXz/+9hq7qOwzKUUEXWNz/fp19OrVy/R4wYIFWLBgAYYNG4YVK1YgOjoaERH8hcaLFy/G8+fPIRaLUaVKFfz++++89TV+fn7YunUrvv76a6xevRo+Pj5YuHAhlXom5C2IHt+HbPsqSO7kXFGK9SwL7agvYazVuIQjI7YiOsOI3ofi8SyNX9J5RDU7LGyWd1JjIhLB0LYnDI1aQb5pBaTnDvMPRz2D8ttx0A0eDX3ngTSFkpRKvf2UmHYxGa/G2l9NR/NztKpl14TkiFGpVNybuxFSeLR40HqJHt2DbN8GSK6dzfE4J5ND130Y9N2HAnLzMrsAXX9bVxTXP15jRM+D8biv4q8FGFxZiRWtXSEWFW50UHwzBPJ/FkOUFG92zFCnCbQffwXOmdZmvg3692+Zuh+Iw/nX9rSZ08gJn9ct+j1t6PrbNioeQAgRHsdBfOsSFD98Abtvx+aa1Oibts9cmN3v/VyTGkLeVpKWRd/DCWZJTR8/BZa/RVIDAMZ6zZAx/1/og7qbHZPcugzljA8hDr1Y6OcnxFL1q8SfjraTpqORUoLGHQkhmdKSIT0bDOnJvRBFPcu1m7FGPWgHfgy2Wu0SDI7YomQdiwHB8bidyC/X3LW8An8FuUHyFkmNiZ0DtB9OhaFuIBSrF4PJSDMdEqUkQfnTNOg6D4Ru8GhAKsvjiQixHr0rKjE1JGs62s0EPSJSDKjkRLeFxLrRbzAhtozjIHpwC9ITeyC5cgqMXp9rV2OlGtAN/AjGWo2oMAApdml6FkOOJOBaPP93sl1ZOf5t62ZW/extGZu0RUblACj++B7iB7d4x2TB2yG+fwOa8XPA+ZQr0u9LiBC87cRo4SPDueis6Wi7nqjxRTFMRyOkJFFiQ4gtSk+F9NxhSE7sg/jlkzy7GgIaQN9jGIy1m1BCQ0pEmp7FsKMJCInV8dpb+siwoYMbFJLi+T3k3L2hnr4E0r0bINu1BgyXVX1N/Owh7GaPhvaDyTA0bV8s35+QktTPT0mJDSl1KLEhxFZwHEQP70B6Yi8kl06A0ety7yoWw9AoCPpuQ8BWrlGCQRJbl6RlMehIPK7E8UdqAj1l2NzRHXaSYl4aKpZA33ckjDUbQPHHPIgSYkyHGE0GFMu/hf7+TWiHjQNk8uKNhZBi1KuiElOyTUd7nGJAZZqORqwY/fYSUtrptJBcOArpkR0QP3+UZ1fWswz0bXvC0LobVYMiJS46w4j+h+NxN1uhgHruUmzt5A5HacnVu2Gr10XGd6ug+GcxJJdP8Y5Jj++G6OEdaCbMAedNU9OIdfK2E6Oljwxns43afEmjNsSKUWJDSCnFpCRBeng7pCf2gklPybUfJxLB2LAV9G17Za6fob07iADuJunxzrEEPEnl71NT312KHZ3d4SIX4PfS3hGa8XMgOb4b8o3LwBiyRpHEzx7CbtZoaD6cCmNg25KPjZAi0NdPyU9sIiixIdaNEhtCSpsUFWQHN0N6dBcYnSbXbqyHN/RtesIQ1B2ci3sJBkgI386IDEw4q0K6gb+tWksfGTZ1cIeTTMBkm2Fg6NAXbJWaUPw+B6K4l1mHNBlQLpsD3f2+0A0dS1PTiNXpVVGJqReTwf7/n15oIk1HI9aNfnMJKS10Wsj2bYD00FYw2pwTGo4RwdigOfRte8NYpzEgEpdwkIRk0bMcvr2agqW308yOdS2vwD9t3aAspkIBBcX6VUfGt39CsXqR2dQ02bFdED+8C8342eC8fQWKkJCC87YTo4U3TUcjpQclNoSUAuIbFyBf/xtEcVE5HufsHKBv2xP69n3AeZYp4egIMXcjXocJ51Rme9QAwHvV7fBTcxdIi2KfmqJk5wDN+DmQHt0J2eYV/KlpTx/AbvZoaD6YQlPTiFXpV4k/HW03JTbEilFiQ4gVY5ITIV+zBJKrZ3I8zjq7Qd9jGPRtewJyZY59CClJagOHxTdT8MutNBj5M88gEwGLmrlgpL+9MMHlB8NA36k/jFVrQrFsLu/DBEadDuWyOdDf7gntO+MAhZ2AgRKSPz0rKDH5Ar862pNUA/wc6RaRWB9aJUyIlRLfuw7lzI9yTGpYJ1doh41HxuJN0HcZREkNERzHcdj9RI3AnTH4KdQ8qSlrJ8KB7p6WndS8hq1UAxlz/4ShcZDZMempfbCb9TFEj+4KEBkhBeNtJ0Zzbxmvbe8TtUDREPJ2KLEhxNqwRkh3/gvFwkkQJSfyDnGMCLpOA5CxcB30XQfRYmZiEe4k6tH7UDxGnkjE8zSj2fHBVZQ428cLjT1lOZxtwewdoZkwF9p3PwUn5n+6LYp5AeX3EyDbsRrIY88oQixBHz/+h1+7n1JiQ6wTjTMSYk3UGVAsmwPJrUtmh4yVA6B9/0uwFasJEBgh5pK0LH58JMV/52JNVZdeV8ZOhCUtXNC1vBWPKJqmptWC4o/vIYp+nnWIZSHbvRbiK6eh/XAq2Co1BQyUkNz1qqjEtIvJpsdX4vR4nmZAeQe6TSTWhUZsCLESTEoSlAu/MEtqOIaBrvcIqGcspaSGWASW47AmLB0N/4vGtiipWVIjZoBPatrjQl9v605qXsNW8kfGt39C376P2THxiydQfjceso3LAHWGANERkrey9mIEZhsx3fs09+0CCLFUlNgQYgWY2JdQfj8B4ogwXjvr5ArN5EXQDfgQENMna0R44cl69DwYj8/Oq5CkNR+maVdWjnN9vfBDUxdhNt0sTnIltCO/gPrLH8A6u/IOMRwH2eFtsJs+ApLzRwAuhyEsQgTU20/Be7yXpqMRK1TK/qoQUvowL55A+f0EiGJe8NqNlfyh/m4VjLUbCxQZIVmMLIcloalotTsW52PM15T4OYqxob0bdnR2Rw0XqQARlhxjvWbImL8G+lZdzY6JVAlQrJwH5fzPIHryQIDoCMlZ72zrbEJidIjKMF8TR4glo8SGEAvGxEVB+eNksyIBhtpNoJ6+BJyLu0CREZLleZoBvQ7FY+7VFGiz3QcpRBxmNXJCSF9v9KioBMNY2N40xcXBCdqPp0M9eRFYD2+zw+IHobCbPRry5XPBREcKECAhfBUcJGjokfWhAwdgH43aECtDiQ0hFopRJUD54ySIVPG8dn3zjtB8MZ/2yCAWYVeEOtdRmo6+cmxtqMGXdR2hkNhIQpONsU4TZMz/F7pe74KTmI9USS+egN3XIyH/5ycwuWywS0hJMauORmWfiZWhxIYQS5SWAsWiyRDFvuQ169v1hnb010AON0iElCQ9y2FaiArvn0xEso6/XsRNLsKfQa7Y1skdZRS0lgRyJXQDP0LGvH9gqNfM7DBjNEJ6ci/spg6HfOU8MC+elHyMhADoXZGf2JyP0SFWTdPRiPWgxIYQS6PXQbnka4gjI/jNzTtC+97ngIj+2RJhxaqN6HMoHivvpZsda19WjvN9vTC4ip3tTDvLJ86nHDRf/gD15B9hrFDV7DjDspCePwL7r9+HYtEUiK+dBVi6qSQlp5KTBHXcsj44YzlgP1VHI1aEyigRYkk4DvJ1v0L88Dav2VC/ObQfTaekhgjucqwO751IQFQGy2uXiYDZjZ0xtqY9RJTQ5MlYJxDqWo0huXQCsv/+NhuZBQDJ7cuQ3L4M1t0b+na9YAjqDs7ZTYBoia3p46fErUS96fGep2qMqmEvYESE5B/dJRFiQSTHd0N6aj+vzehfD5rxcwAJfQ5BhPVvWDp6HIwzS2rK2YsR3MMT42s5UFKTXyIRDM06IGPBWmhGfw22bMWcuyXEQL59Fey+GAz5iu8gCgulUtGkWPXJVvb5dJQWiRoaOSTWge6UCLEQovs3Id+wlNfGepaF+tPvAJlcoKgIAbRGDlNCVFj7wHxzyTZl5Fjd1hXuCrEAkZUCEgkMLTvD0LwjxNfOQXZgE8SP7pp1Y4wGSEOOQRpyDGyZ8tAH9YChZWcaxSFFrpqzFDVdJLirMgAAjByw/5kGI6rTqA2xfDRiQ4gFYBJjofh9Nhhj1qdinEIJzefzAAcnASMjtu5FuhHdD8TlmNR8WtsB/3V2p6SmKIhEMDZuDfWs5ciY+yf0Qd3B5fKBhijqOeRb/oDdF4Og+HUGxDfOA0ZDCQdMSrPse9rsoepoxErQiA0hQmONkK+cD1GqitesGf012HKVhImJEABno7UYdSIRcRr+1DN7CYPfW7mgXyUqOV4cWL/q0H44FdqhYyE9cwjS47vMNugFMqupSa6dheTaWbAuHjC07gp9627gvH0FiJqUJn38lPjhRqrp8ckoLVRaFi5y+jycWDZKbAgRmHT/Zkju3+C16fqOhLFRa2ECIjaP4zj8eS8dX19KhjHbco7KjmKs7+COmq5UcrzY2TtC33UQ9J0HQHz3GqTHd0N84zxvZPcVkSoesr3rIdu7HoYa9WEI6g5DkzY0jZUUSg0XCao5SxCenDkSqGeBQ881GFqVPswglo0SG0IEJHp0D7Kdq3lthlqNoOszUqCIiK3TsxymhqjwT5j51LMu5eRYGeRGn9qWNJEIxtqNYazdGExyIiTngiE9fQCiqGc5dpfcvwHJ/Rvg1v8KQ7OO0LfpAdavegkHTawZwzDoU1GJxaFZoza7n6gpsSEWjxIbQoSizoDij+/462ocnTM34KSyzkQASVoWI08k4nSU1uzYtPqOmFbfkaqeCYxzdoO++1Douw2B6OEdSE8fgOTicTBa871GmIx0SI/vhvT4bhgrVIUhqDv0zTvSuj2SL739FLzE5vhLDVJ0LJxk9PeJWC5KbAgRiHzj72b7V2g+mArOxV2giIgte5isx5CjCXiUwp/m5ChlsDLIFd0rKHM5kwiCYcBWqw1ttdrQvjMBkksnID19AOKHd3LsLn72EOL1v0G2ZQUMjYJgaNsTxhr1AUpUSS7quElRyVGMiNTM9wStEQiO1GBgZRq1IZZL0LT73LlzGDp0KAICAuDi4oINGzbk2f/MmTMYNmwY/P39UaZMGbRo0QLr1q0z6+Pi4mL29eDBg+J8KYQUiPjWZUhPH+C16Tr0hbFhS4EiIrbs1EsNOuyLM0tqKjhk7k9DSY2FU9rB0KYH1DOXIX3+v9B1HQzO0TnHroxeD2nIMSh/+AJ2X70H6aFtQFpKCQdMrAHDMOiTrTrabqqORiycoCM26enpqFmzJoYNG4ZPPvnkjf0vXbqEWrVq4bPPPoOPjw+OHTuGzz//HAqFAoMGDeL1DQkJgaurq+mxh4dHkcdPSKFoMiD/dzGviS1bEbqhYwUKiNiy1ffTMSVEZVYkoLm3DOvau8GDSjlbFc7XD7ph46Ab9DHE189njuLcugyGY836iqKeQ75pGWTb/4ShSTvo2/cGW7UWjeIQk94VlfjlVprp8dFILdL1LOylNB2NWCZBE5vOnTujc+fOAIBx48a9sf+kSZN4jz/88EOcOXMGe/bsMUtsPD094e5OU3qI5ZFtXwVRfIzpMceIoPloOlUvIiWK4zh8ezUFS167aXllWFU7/NLCBXIx3eBaLYkUxiZtYGzSBkxiLCRnDkF65iBEcVFmXRm9HtLzwZCeD4axXGUY2vWCvkUnwM5BgMCJJWngIUU5ezEi0zNHc9VGDkdfaM1GcgixFFa/xiY1NRVly5Y1a2/bti10Oh38/f0xefJkBAUF5fk84eHhxRUiAf18X7F//hDVjuzktcUFdsALVgKU4p8RXX/LYmCBeQ9l2BfL/xPAgMMEPz1GeGfg2eP4Ivt+dP0tQM3mQEBTODwJg8e1U3AJuwGGNS8bLY58DPG6XyHZvAJJtQKRUL8VMnwrv9UoDl1/6xbkLMXG9Kzy7utvxaKmXpfv8+n627biuP7VqlXL9ViBE5vg4GB07NgRIguo2nTo0CGcOnUKhw8fNrX5+Pjg559/RsOGDaHT6bBlyxb06dMH+/btQ8uWua9fyOuHRN5OeHg4/XwBQKeF3d/fgUHWnB/WsyzsPvwS1eSl99Mvuv6WJcPAYtSJRByO5Vc+s5Mw+DPIDT0rFu3vIl1/C1PdH+jcGxmqhMxRnJN7IYqPNusm1uvgceMsPG6cBetTHvpWXWBo0Rmcu1eBvh1df+v3vrMWG19mfdBxXiVFuUoVoJS8Odml62/bhLj+BU5shgwZAk9PTwwYMABDhgxB/fr1iyGsNwsJCcHHH3+MhQsXolGjRqb2atWq8X6IgYGBePbsGZYuXZpnYkNIcZMe3GK274T2g8lAKU5qiGVJ1bMYciQB52P4n7a6y0XY1skdDT1lAkVGShrn4g59r+HQ9xgG8e3LkJ7YA/H1CzmvxYl+Dvn2VZD99zeMAQ1gaNkFhiZB9N5lIxp7ylDWToSXGZm/G+kGDsdeaIr8QxBCikKBh102b96M1q1bY+3atWjfvj2aNm2KJUuWIDIysjjiy9GFCxcwaNAgfPXVV/jwww/f2L9Ro0Z4/PhxCURGSM6Y+GjI9vGr/unb9ICxZkOBIiK2RqVl0fdQvFlSU8FBjMM9PCipsVUiEYx1m0Lz2Txk/LwZ2r7vg3XzzLErw3GQ3L0GxV8LYD+xH+Qr50N86xJgNJRw0KQkiRgGvbIlMXuoOhqxUAVObLp06YK///4bYWFhWLp0KcqUKYPvv/8e9erVQ69evbBhwwakpqa++YkK6dy5cxg0aBCmTp2ar4IDAHDr1i14e3sXW0yEvIl84zIwuqypP5yjM7RD3lwJkJCiEK8xoteheFyN1/Paa7lKENzDE1WdpbmcSWwJ5+YFfb/3kbF4E9Sfz4ehUWtw4pwndjBaDaTng6FcPBV2XwyCbP1SiB7dAzgux/7EuvXOVizg0HMNtNlLKRJiAQpdPMDBwQHDhw/H8OHDER0djW3btmHLli2YOHEipkyZgu7du2PYsGHo0KFDrs+RlpZmGklhWRaRkZEIDQ2Fq6srypcvj7lz5+Lq1avYs2cPgMw9aoYMGYIPP/wQgwcPRkxMZmUpsVhsKue8fPlyVKhQAQEBAdDpdNi6dSv279+PtWvXFvalEvJWxLcuQXL1DK9NO2g0YO8oUETElkRlGNH3UDzCkvmfqjfykOK/zh5wkQu/XpJYGLEExgYtYGzQAkhLhjTkOCRnD0MccT/H7qLkJMiO/AfZkf/AevvC0Lwj9M07gfMpV8KBk+LSzEsGL6UIserM6Wgpeg4nX2rRpbxC4MgI4SuSv2h6vR46nQ46nQ4cx8HR0REXLlzAwIED0aJFC9y+fTvH865fv46goCAEBQVBrVZjwYIFCAoKwvz58wEA0dHRiIiIMPXfuHEjMjIysHTpUvj7+5u+2rVrx4tl5syZaNmyJbp164aQkBBs3boVvXv3LoqXSkjB6HWQr/uN12SsEgBD624CBURsybM0A7ofiDNLapp7y7CzCyU1JB8cnKHv2A/qOX9kbv7ZYxhY19z3hRPFvIBs1xrYT3sXyjmfwPPSUTCqhBIMmBQHsch8Ohpt1kksEaNSqQo1lpicnIxdu3Zhy5YtuHjxIqRSKbp27Yphw4aZqqYdPnwY06ZNg5ubG06cOFHUsRMrYctVUaT7NkC+7S/TY45hoJ69AmylGgJGVbJs+foL6XGKAb0PxZv2n3ilXVk5NnRwg52kZJIauv6lEGuE+P5NSC4cheTyKTDq9Dy7c4wIxlqNYGjeEYZGrQGlXQkFSorSqZda9DmcVR3NWcYgfGgZyPLY74r+/ds2q6iKtn//fmzZsgXBwcHQarVo3LgxFi1ahP79+8PFxYXXt2vXroiNjTXbWJMQW8AkJ0K2dz2vzdC2p00lNUQY91V69D0Uj2g1v8JV1/IK/NvWDYp8lGklJFciMYw1G8JYsyG0Iz6D+GYIpBeOQnwzBIxBb9ad4VhIbl+G5PZlcGt+hqFZB+jb96b3QivT0kcGd7kICdrM95VkHYcz0Vp08KXpaMRyFDixeffdd+Hr64vx48dj2LBhqFq1ap79a9WqhUGDBhU6QEKslWznP2A0WUP1nL0jtAM/EjAiYgtCE3TodzjBdPPxSl8/Jf5q4wqpiJIaUoRkchibtIGxSRsgPRWSy6cyR3Lu38ixO6PTQnr6AKSnD8DoVx36jv1gaNYBkFJVPksnETHoWVGBNQ8yTG27n6gpsSEWpcCJzc6dO9GmTRsw+dyFuFGjRrx9ZgixBcyLJ5Cc3M9r0/UdCTg4CxQRsQVX4nQYEByPZB1/hvHQKkr83soVEkpqSHGyd4ShbU8Y2vYEkxALScgxsCf3Qxmb83YQ4icPIF61EOz2VdB37A99+95UVMXC9fZT8hKb/U81+Lk5R+8txGIUeJL1tm3bcPXq1VyPX716FePHj3+roAixdvItf/A2umO9faFv30fAiEhpdz5ai76HzJOaD/ztsbw1JTWkZHHuXtD3GIb7o2cjY95q6Hq8k+v+OCJVAuTb/4L9F4Mg27ISSFGVbLAk34LKyOEiy3ovSdCyOBety+MMQkpWgRObjRs38iqVZff06VNs2rTprYIixJqJ71yB5GYIr007eAwgob1CSPE49VKDgUcSkGbgJzXjazngp+bOEOVzhJ2Q4sCWqwzd4NGZ++N8Ng+Guk3B5fA7yWg1kB3YBPtJQyHbtBxMcqIA0ZK8SEUMulfItlnnU6qORixHkZfFSUxMhFwuL+qnJcQ6sEbINq/gNRmr14WxUWuBAiKlXfBzDQYfTUBGtqRmcj1HfN/EKd/ThgkpdmIJjA1bQjNpITJ+3ABdpwHg5ObrMxidBrJDW2E35R3Idv4DqDNyeDIilD7ZNuvc+1QNI0ubdRLLkK81NufOncPZs2dNj/fu3WvaWPN1KpUKO3bsQO3atYsuQkKsiCTkOMTPHvHatEPHAnRzSYrB3qdqfHAyEXp+nQDMbOiESfVorQKxXJxXWejenQhdv/chPbEH0uDtECUn8fowWg1ku9ZAcmw39H1HQt+uFyAu9L7ipIi0LSuHk5RBij4zmYlVszgfo0PrMvShNhFevt4hzpw5g4ULFwIAGIbB3r17sXfv3hz7VqtWDQsWLCi6CAmxFgYDZDv+4TXpm3UAWyVAoIBIabbjcQY+Pp0EY7YPSucFOmN8LQdhgiKkoOwdoe85HPpOAyA9uRfS/ZsgyjYFTZSqgnzdr5Cc2APtiM/A1qgvTKwEACAXM+haQYGtj7KmoO2MUFNiQyxCvhKbiRMn4oMPPgDHcahRowYWL16MXr168fowDAM7OzvY29sXS6CEWDrJmQMQxb00PebEYuj6fyBgRKS02hiejgnnVMg+++On5s74sAYlNcQKyRXQdxkEfbvekJ7YA9ne9WBSk3ldxJERsFvwOfRN20M39BNwbl4CBUv6V1LyEptdT9RY2MyZyskTweUrsbG3tzclLDdv3oSHhwfs7GjnYEJMdFrIdq3lNRmCeoDz9hUoIFJa/RuWjs/Pq3htDIDfWrpgRHX6YIlYOZk8M8EJ6g7ZwS2QHtoKRqvhdZFePA7JjfPQ9XkP+i6DqDCLANqXVcBFxkD1/yqMiVoWp15q0bEc7WlDhFXg4gEVKlSgpIaQbKTHdkGkijc95qQy6HqPEDAiUhr9cTfNLKkRM8CfQa6U1JDSRWkPXf8PkPHjBuhbdTE7zGg1kG/9E3bffADxrcsCBGjbZGIGvbMVEdj+mIo8EOG9ccSmZ8+eEIlE2LFjByQSidkUtJwwDIM9e/YUSYCEWDx1OmT7NvCa9B37gctlzwZCCuOX0FTMuZrCa5OKgFVt3MyqFBFSWnAu7tB+/BX0bXpCvu5XiJ895B0XRT+HcvEU6Ft2gfad8YCDk0CR2p4Bleyw9vXNOp9poDFwUEhoOhoRzhtHbDiOA8u+ttEgy4LjuDy/Xu9PSGknPbwdTFrWDSensIOuxzABIyKlzeKb5kmNTASsa09JDbENbPU6UM9dCc17X4CzN6/4Jz13GHZfj4T4ymkBorNNrXxk8FZm3Uam6jkceaHJ4wxCit8bR2z279+f52NCbFpGGmSHt/GadF0HA44uwsRDSp2lt1Lx/TV+UqMUM9jYwQ3tfGk+O7EhIjEMHfrAENgG8u1/Q3JqHxguq4KGKDkJyqWzoG/SFroRn4JzdhMw2NJPLGLQ10+JlffSTW3/PVajV0X6sIUIp8g36CTElkiP7gSTkWZ6zNk7Qt91kIARkdLkj7tpmHmFn9Q4SBhs6+xOSQ2xXY4u0I6aBPXM5TCWq2R2WHr5JOy+eh+S80cAjjaOLE4DKvOTmMPPNUjNvrEWISWowInNvXv3zNbPnD59Gv3790f79u2xbNmyIguOEIumzoDsULbRmi6DACUt4iZv75/76Zh+kV/u1l7CYHtnd7Tyof0iCGGrBEA990/o+o4EJxbzjjHpKVCsnAfFL9+ASUnK5RnI22riKUN5h6yfvdrI4cAzmo5GhFPgxGbOnDnYsCFroXRkZCTeeecd3Lx5ExkZGZg5cyY2btxYpEESYomkx3eBSX9tbY2dPfSd+gsYESkt1oen44sLKl6bUsxgc0d3NPOmpIYQE4kUun6joJ7zJ4x+1c0P3zgP5YwPIL51SYDgSj+GYTCgEn/UZtsjqo5GhFPgxCY0NBQtWrQwPd66dStYlsWZM2cQEhKCLl26YNWqVUUaJCEWR5MB2cEtvCZ954GAHW2OSN7OtkcZmHhWxWuTi4GNHdxoZ29CcsFWqAL1rOXQDh4NTsrf10aUnATl4qmQbVwG6HUCRVh6DarM3wLk+EstYtVGgaIhtq7AiU1iYiLc3d1Nj48cOYLWrVujbNmyAIAuXbrg4cOHuZ1OSKkgPb6Htys2p7CDrvNAASMipcHuJ2p8ciYJr68KkIqAte1oTQ0hbySWQN/jHWR89zeMVWuZHZYd3gblt2PBvHhS8rGVYrXcpKjlmlWLiuUyiwgQIoQCJzaenp549uwZAEClUuHKlSto166d6bhWqy266AixRDotpNlHazr1B3IoQUpIfu1/qsaHJxNhfC2rETPA6rZu6FKekhpC8osrUwHqr3/NXHvD8G9zxM8ewW72aEiO76bCAkVoSBX+qM1W2qyTCOSN5Z6za9euHf788084OTnh7NmzAIDu3bubjt+/fx++vr5FFyEhFkZy5hBEry1G5RRK6KgSGnkLRyI1eP9kIgyv3WeJGGBVG1cqnUpIYYgl0PUbBUOtRlCsnAdRfIzpEKPXQbFmCQy3r0Dz0TSaQlwEBlS2w+wrKabR5uvxejxQ6UFbdZKSVuARm1mzZiEgIAAzZ87E8ePHMWfOHFSoUAEAoNFosGvXLgQFBRV5oIRYBKPBfG1N+z6Ag7NAARFrd/KlBu8eT8DrFVIZACtau6JfJbtczyOEvBlbvS4yvl0FfdP2ZsckV8/Abs4YiJ4/FiCy0sXXXoygbGsAt9J0NCKAAo/YeHp64uDBg0hJSYFCoYBMJjMd4zgOe/bsQbly5Yo0SEIsheTKaYjiXpoecxJpZtEAQgrhdJQWw44mQpttne1vLV3MpnYQQgrJ3hHasTNhrNsU8nW/gNFk3XCLYl5A+e1YaEdNhqFFJwGDtH6DqyhxKiprOcLWRxkYVE/AgIhNKvQGnU5OTrykBgCUSiXq1KkDV1fXtw6MEIvDcZDu38RrMrToBM7VQ6CAiDU79VKLIUcSoDby5/n/1NwZI6rTXkiEFCmGgaFVF2R8u8qsLDSj00Kxch5km5YDLFXzKqxeFZVQvLad0LM0I0JTaB94UrIKPGIDAEajEcePH8eTJ0+QlJQELtsCPIZhMHXq1CIJkBBLIb5zFeKn4abHHMNA132ogBERa3XqpQZDjiZAk+0eal6gMz6sQfP9CSkunLcv1N8shXz9UkhP7eMdkx3aClHUM2jGzqSNlgvBSSZC9wpK7IjIGhE7GCcGrUAlJanAiU1oaCjeffddREZGmiU0r1BiQ0oj6X7+xrPGhq3AlakgUDTEWp18qcHQHJKa75o4YXwtSmoIKXYyObQfTIaxai3I1/4MRq83HZLcDIHy2/HQfDEfnFdZAYO0ToOr8BObo/ESaI0c5GIqI0BKRoHHCCdPnoy0tDSsW7cOERERSEpKMvtKTEwsjlgJEYwoIgySu9d4bboewwSKhlirEy9yTmq+b+KEibWpXDghJckQ1A3qb34Hm206sfjlEyi/HQfR4/sCRWa9Ovgq4CbPurVMMTA4EqkRMCJiawqc2ISGhuKzzz5Djx494OLiUgwhEWJ5pIe28h4ba9QDW6WmQNEQa3T8hQbDjuU8/WwCJTWECIKt5A/17D9grFSD1y5KVUG54HOIb14UKDLrJBUxGFCJX6J+6yPa04aUnAInNl5eXpBICrU0hxCrxCTGQnL5JK9N151Ga0j+HY3U4J1ckhqafkaIsDhXD6i//hX6Zh147YxOA8UvX0Fy5qBAkVmnwdkqOh56roFKy+bSm5CiVeDEZvTo0di8eTP0r81JLaxz585h6NChCAgIgIuLCzZs2PDGc+7cuYPu3bvDx8cHAQEBWLhwodlan7Nnz6JNmzbw9vZGvXr1sHr16reOldgu6dFdYIxZd6RsmQow1gkUMCJiTXZGZOQ4UjOfkhpCLIdMDu0nM6DrOZzXzLAsFKsWmo3ak9w19pSismNWeTQdC+x5SnvakJJR4KGXsmXLQiKRoHnz5nj33XdRrlw5iMVis379+vV743Olp6ejZs2aGDZsGD755JM39k9JSUG/fv3QokULHD9+HOHh4Rg/fjzs7OwwceJEAMCTJ08wePBgDB8+HH/++SdCQkIwadIkuLu7o0+fPgV9ucTWaTWQntzLa9J1GQiIqIQlebM1Yen4/LwK2cusLAh0xlhKagixLAwD3aCPwbm4Q7ZhKZjXPjSVb1oO6LTQ9x4hYIDWgWEYDK5ihx9upJraNj/MwHtUxp6UgAInNh9++KHp/+fOnZtjH4Zh8pXYdO7cGZ07dwYAjBs37o39t23bBrVajRUrVkCpVKJmzZp48OABli9fjgkTJoBhGPzzzz/w8fHBokWLAAD+/v64cuUKfv/9d0psSIFJzgeDSc96c+bsHWFo0VnAiIi1+O1WKmZdSTFrX9TMGR8HUFJDiKXSd+oP1tUDij++41VMk//3NxidFroBHwIMVfnKS/bE5nyMDs/SDKjgQEsZSPEq8G/Y3r1739ypmFy6dAnNmzeHUpm1MK1Dhw6YN28enj59Cj8/P1y6dAnt27fnndehQwds2rQJer0eUqm0pMMm1oplITu8ndekb9sLkCsECohYA47j8N21FPwcmsZrFzPA8tauGJJt/jkhxPIYGwdB8+VCKJZ8DUaXVdVLtnc9oNdBN3QsJTd5qOwkQRNPKS7HZSWG2x+r8WVdKpRCileBE5tWrVoVRxz5Ehsbi7Jl+XXlPT09Tcf8/PwQGxuLtm3bmvUxGAxISEiAj49Pjs8dHh6eYzspGtb483V8dBtVo56ZHnMiMR5UqQe9Fb4WoVnj9S8MlgN+fCTFf9H8D1BkDIf5NXRoyL6AjfwoeGzl+pOcWe31lzrCfthnqLLpV4hfT24ObUViSiqi2vWj5CYPbR0luBwnMz1ed1eFnopo+pHZmOL491+tWrVcjxV6TFCtVuP69euIi4tDy5Yt4eHh8eaTigCT7V/Eq8IBr7fnp092ef2QyNsJDw+3yp+vYvefvMeGwLbwa9RUoGisl7Ve/4LSGTmMO5uE/6L5i2QdJAw2dvRAUBm5QJEJy1auP8mZ1V//atWg9asE5eIpvGnJPucPwtXbB/q+IwUMzrKNKW/EzxFRMHKZ914RahHU7hVRz132hjNJaSHEv/9CrYD+448/4O/vj549e2LUqFG4c+cOACAhIQEVKlTA2rVrizTIV7y8vBAbG8tri4+PB5A1cpNbH4lEAjc3t2KJi5Q+THQkJLcu89r0nQcKFA2xdCk6FoOOJGD7Y35S4ypnsKer7SY1hJQGbOUaUE/7GZw9fxqVfOc/kO7fKFBUls9DIUZzF36Z562PqDoaKV4FTmw2bNiAr776Ch07dsTSpUt5pZbd3d3Rrl077Ny5s0iDfCUwMBAXLlyARpM1JHzixAmUKVMGFStWNPU5efIk77wTJ06gQYMGtL6G5Jv0xB7eY2OVALBVAgSKhliy6AwjehyMx6koLa+9jJ0IB7p5oqEnfTpJiLVjK1aDesoicEp+ZS/51j8hOb5boKgsXzcvA+/x9scZMLLZ60QSUnQKnNgsW7YMXbp0werVq9GtWzez4/Xr10dYWFi+nistLQ2hoaEIDQ0Fy7KIjIxEaGgonj9/DiCz6lrv3r1N/QcOHAilUolx48bh7t272LNnD3755ReMGzfONM1s1KhRePnyJaZPn46wsDCsXbsWGzduxIQJEwr6Uomt0mogzbYhm77Dm6v8EdsTnqxH5/1xuJXI39eripMYB7t7IsCVPkwhpLRgK9WAevKP4LIVkJGv/QXibJs4k0xBbkY4SrOWAcSoWbMPgQgpSgVObB49eoQuXbrketzd3R0JCQn5eq7r168jKCgIQUFBUKvVWLBgAYKCgjB//nwAQHR0NCIiIkz9nZ2dsXPnTkRFRaFdu3aYMmUKxo8fz0ta/Pz8sHXrVpw/fx6tW7fG4sWLsXDhQir1TPJNcvEEv8SzgxMMTdoIGBGxRJditei8Pw7P0vg7bzbykOJwD0/4OVJZU0JKG7ZqLai/XAhOljW9lOE4KP6YB/HdawJGZpkUYqC3n5LXtuVRhkDREFtQ4L+8jo6OSE5OzvX4o0eP8l1IoHXr1lCpVLkeX7FihVlbrVq1cPDgwRx6Z2nVqhVOnz6drxgIyU56fBfvsT6oByCjNRIky8FnanxwMglqI39KRZdycqxu6wZ7KW3gSkhpxdaoB8342VD8OgMMm7mGhDHoofh1BtRf/QLWr7rAEVqWwZXtsCE8K5nZ91SDdD1L75OkWBT4tyooKAgbNmyAVms+lPjixQusWbMGHTt2LJLgCClposf3IY7ImkrJMQz07XoJGBGxNH/dS8Pw44lmSc2IanbY0MGd/lgTYgOM9VtA++E0XhujyYBiyVdgEmNzOcs2tfKRoaxd1vtiuoHDgWeaPM4gpPAK/Bd4xowZiI+PR9u2bfHXX3+BYRgcOXIEc+bMQcuWLSGVSjF16tTiiJWQYifNtgjUWCcQnFfZXHoTW2JkOXx1UYUpIcnIvvZ1an1H/NbSBRIRbdBAiK0wtOoC7ZBPeG0iVQIUS74GNDTd6hWxiMHAyvyNibfSdDRSTAqc2FSuXBmHDh2Cj48PFi5cCI7jsGzZMvz666+oV68eDh06BF9f3+KIlZDilZYCScgxXpO+A63NIkCansW7xxOx4m46r13EAL+0cMHXDZzy3CeLEFI66bsPha7rYF6b+NlDKP6YB7DGXM6yPYOr8BOb4y+1SNDQz4cUvUKtbvX398fOnTuhUqnw+PFjsCwLPz+/Etukk5DiID13GIxeZ3rMenjDWJc25LR1URlGDDmSgNBslc/sJQxWt3VDl/KKXM4khNgC3ZAxEMW8gOT6OVOb5Po5yLashG7YOAEjsxy13aSo6SLBXVVm+WcjB+x/psF71e3fcCYhBVOgxEar1WLLli04ceIEIiIikJaWBgcHB1SuXBnt27fH4MGDIZPRng3ECnEcJKf285r0bXsDIrFAARFLcCtRj6FHEvAig//JYlk7ETZ3dEdd2kGbECISQ/PJN1DO+xTiZw9NzbJDW8GWqQBD254CBmc5+lRS4u71rIqju5+oKbEhRS7fU9Hu3LmDwMBAfP7559i1axciIiKgVqsRERGBnTt34tNPP0WzZs3yvYcNIZZE9OguxC+emB5zIhEMrbsKFxAR3JFIDbrtjzNLauq4SXG0pxclNYSQLAo7aL6YD9aFP3NFvnYJxHeuChSUZemTrezzqZdaJGlZgaIhpVW+Epu0tDQMGzYMcXFxmDlzJu7cuYOnT5/y/jtjxgxER0dj6NChSE9Pf/OTEmJBpNlGa4z1m4NzcRcoGiK0f+6nY+jRBKQZspVzLq/Awe4eKGtPI3mEED7OzQuaz+eBk2VNT2WMRih+nwXm5VMBI7MMNVyk8HfOmihk4ID9z9QCRkRKo3wlNhs2bEBkZCS2bNmCL774AmXL8qtElS1bFl9++SU2bdqEp0+fYuPGjcUSLCHFQp0BycXjvCZ9mx4CBUOExHIcZl9OxhcXVMhWzRljAuyxsb0bHKicMyEkF2wlf2jGfMNrYzLSofz5KyBVJUxQFqRPJf6ozZ4nlNiQopWvv9DBwcFo3749WrdunWe/Nm3aoF27djh06FCRBEdISZBcPA5Gm1VTn3XxgLFOoIARESFoDBw+OJmEX2+n8dpFDPBDU2csbOYCMZVzJoS8gbFxa2gHj+a1ieJeQrFsLmA0CBSVZehTkZ/YnHiphYqmo5EilK/E5u7du2jVqlW+njAoKAh37959q6AIKUnS0/xpaIbWXQFxoQoGEiuVoDGiz+F47Mr26aGdhMH69m74pKaDQJERQqyRvvsw6IO689ok965DtvkPgSKyDDVdJaj22nQ0PQscfE6bdZKik6/EJikpCV5eXvl6Qk9PTyQlJb1VUISUFNHzxxA/usdry/7HiJRuj1MM6LQvDhdjdbx2L6UI+7t5oHsFZS5nEkJILhgG2pFfwOhfj9csC94OyblggYISHsMwZqM22T9QIuRt5Cux0Wq1kEql+XpCiUQCnU735o6EWABJ9tGamg3BeZXNpTcpbW4m6NB5fxwep/Irn/k7S3CkhycaeFDlM0JIIUmk0EyYA9aN/8Gw/J/FED15IFBQwsu+zubECw2SdTQdjRSNfM+3efLkCa5efXPJwoiIiLcKiJASY9BDev4Iv4mKBtiMkBgtBh9NQIqOXyWglY8M69u7w0VORQIIIW+Hc3KF5tNvoZw3EYw+c5NfRq+D4reZyJizEnByETZAAdR2laCyo9j0gZKOBY6/0KBfJTuBIyOlQb4TmwULFmDBggVv7MdxHBiGFtgSyye+EQImLcX0mLN3hKFh/taSEet24oUGw48nIiNbOechVZRY2tIVMjG9hxFCigZbqQa070+G4q+seyhRQgwUy+ZAM3Wxza3pZBgGPSoqsfS1Qi0Hn1NiQ4pGvv41LVu2rLjjIKTESc/xq/fpm3UAZHKBoiElZe9TNT48mYjsMx8m1HLAd02c6IMZQkiRM7TqAt2TMMiO7DC1Se7fgGzzH9ANnyBgZMLoWl7BS2yORGpgZDmqPEneWr4Sm3feeae44yCkZKWoIL4ZwmsytOwiUDCkpGx+mIHxZ5PM9qj5poEjJtdzpKSGEFJsdEPHQfz8EcT3b5raZMHbwfpVh6FlZwEjK3lNvWRwkTFQ/X8qcJKWw6U4HZp704eL5O3QJHJik6QhR8EYsxaMs2UqgK1cQ8CISHFbdS8Nn5wxT2oWBDpjSn0aqSGEFDOJBJrxVEwAACQiBp3KKXhth6nsMykClNgQmyQ5e5j3WN+qC0A3tqXW77dTMTkkmdcmYoDfW7lgbC3ao4YQUjJeFRPgXqs0y+h1UCydCaQl53Fm6dO1PD+xOUSJDSkClNgQmyN6/hjip+GmxxwjgqGFbU0DsCW/307FjMspvDapCPinrRverWYvUFSEEFv1qpjA60TxMVCs+B5gjbmcVfp08FXg9Tot91UGPEk1CBcQKRUosSE2R3KWXzTAWKsRODdPgaIhxSmnpEYpZrCpgzv6+NHGm4QQYRhadYGuYz9em+T2Zch2/CNQRCXPRS5Cc2/+XmE0akPeFiU2xLYYDZBcyLZ3TSsqGlAa5ZbUbO7ojo7Z5nYTQkhJ0w0bB2PV2rw22d71EF87K1BEJa8LTUcjRYwSG2JTxLevQJScZHrMKe1p75pSKK+kpk1ZqrpDCLEAEik0E+aAdXbjNSv+XAAm+rlAQZWsbtkSm3PRWqRkr8VPSAFQYkNsiuTCUd5jQ5M2gJw+vS9NKKkhhFgLztUDmvFzwInFpjZGnQ7FbzMBTYaAkZWMqs5SVHHKeu16FjjxUitgRMTaUWJDbIc6A5KrZ3hNehvbO6C0W3YnjZIaQohVYf3rQjd0LK9N/OIJ5KsXARyXy1mlR/bpaCde0HQ0UniU2BCbIbl2Fowu65Mg1s0LbPW6AkZEitLS26n45hK/XColNYQQa6DvNAD65h15bdKLJyA9vF2giEpOB19+YnP8pRacDSR0pHhQYkNshlnRgOYdABH9EygNfruVipk0UkMIsVYMA+2oSTCWq8xrlm1ZAdH9G8LEVEKae8sge+1P8bM0IyJSbafsNSladFdHbAKjSoD49lVem6FFJ4GiIUXp11upmHWFkhpCiJWTKzM377TL2l+LYVkols0FkxgnYGDFy04iQnNv/nv1iZc0HY0UDiU2xCZILh4Hw2VVWjGWrwI22ydjxPosCU3F7BySmi2dKKkhhFgfzrscNGO+4bWJUpKg+H02YNALFFXxa5ft/fr4CyogQAqHEhtiE8yqodFojdX76WYq5l7lJzV2EgZbO7kjqAwlNYQQ62Ss3wK6Pu/x2sSP7kK2cZlAERW/ttkSmzNRWhhYWmdDCo4SG1LqMVHPII4IMz3mGAaGZu0FjIi8rcU3U/HdtZyTmtaU1BBCrJyu70gY6gTy2mTHdkFy9rBAERWvuu5SuMuzbklT9ByuxesEjIhYK8ETm1WrVqFu3brw9vZGmzZtcP78+Vz7LliwAC4uLjl+xcVlzj89c+ZMjscfPHhQUi+JWBhpttEaY4364Ny8BIqGvK0fb6Tg+2xJjb2EwbZO7mjlQ0kNIaQUEImh+WQGWM8yvGb5vz9B9DRcoKCKj4hhzEZtaDoaKQxBE5sdO3Zg+vTpmDRpEk6fPo3AwEAMGjQIz5/nvOPuxIkTERYWxvtq2bIlWrVqBU9PT17fkJAQXr8qVaqUxEsilobjaBpaKbLwRgrmX0/ltb1KalpSUkMIKU0cnKCZ+C04qczUxOh1UCydBaSl5HGidWrny38PP0kbdZJCEDSxWbZsGd555x2MHDkS/v7+WLRoEby9vbF69eoc+zs4OMDb29v0pdfrceHCBYwcOdKsr6enJ6+v+LVdfYntEEWEQRT70vSYk0hhaNRawIhIYS24noIF2ZIaBwmD7Z3d0YKSGkJIKcRWrAbt+5N4baK4KCj++B5gS1dJ5HZl+fvZXI7TIVnH5tKbkJwJltjodDrcuHED7dvz1zq0b98eFy9ezNdzrFu3Ds7Ozujdu7fZsbZt28Lf3x+9e/fG6dOniyRmYn0kIcd4j411mwL2jgJFQwqD4zjMvpyMhTf4SY2jlMF/nd3NyoQSQkhpYmjVBboOfXltkluXINu1RpiAiomvvRjVnSWmx0YOOBtFozakYCRv7lI8EhISYDQazaaQeXp6IjY29o3nsyyLDRs2YOjQoZDLs25sfHx88PPPP6Nhw4bQ6XTYsmUL+vTpg3379qFly5a5Pl94eOmbs2pJBPn5sixqnedvyvncryZUdK1LXGGvv5EDFjyUYXcM/63KXszhlwAN3FKeIbz0zcgodej91bbR9X97TGAXVA27BYfIR6Y22e61eC53Qkr1egJG9mYFuf4N7KR4kCw1Pd59LwbVdaW3zLUtKI5//9WqVcv1mGCJzSsMw/Aecxxn1paT4OBgREZG4r33+CURq1WrxnvBgYGBePbsGZYuXZpnYpPXD4m8nfDwcEF+vuJ71yFLVZkecwolPLv2h6dckftJpMgV9vprjRzGnE7C7hg1rz1zpMYDgV40UmMNhPr3TywDXf+iw0z6Aezs0RClJJnaKu/7BxlzVoLzLidgZLkr6PXvK1djS1Si6fFttRLVqvkVQ2SkJAjx71+wqWju7u4Qi8VmozPx8fFmozg5WbNmDZo2bYqAgIA39m3UqBEeP35c6FiJdZKEHOc9NjRoCVBSYxXS9SyGHU3Arif8pMZdLsLerpTUEEJsD+fmCc342eBEWbduTEY6FL/NBLTqPM60Hs295RC99tn2XZUB8ZrStZaIFC/BEhuZTIb69evjxIkTvPYTJ06gadOmeZ4bFRWF4OBgs9Ga3Ny6dQve3t6FjpVYIYMeksun+E3NOggUDCkIlZZFv8MJOJ6tIk45ezEOdvdAfQ9ZLmcSQkjpxtaoD92Qsbw2cWQE5KsXA5z1b2jpIhehrpuU13YumvazIfknaFW08ePHY+PGjVi7di3CwsIwbdo0REdHY9SoUQCAuXPn5lgYYP369bC3t0e/fv3Mji1fvhz79u3Do0ePcO/ePcydOxf79+/Hxx9/XOyvh1gO8e0rYNKzFl9w9k4w1m4sYEQkP2IyjOhxMA6X4vh/yKo6SXCwuwequ0hzOZMQQmyDvstA6Ju247VJQ45BeuQ/gSIqWtn3IztDBQRIAQi6xqZ///5ITEzEokWLEBMTg4CAAGzduhUVKlQAAERHRyMiIoJ3DsdxWLduHQYNGgQ7Ozuz59Tr9Zg5cyaioqKgUChMz9m5c+cSeU3EMmSvhmZo0gaQ0E2xJbuXpMfgowl4nsafdlDXTYr/OrvDU0kl2wkhBAwD7QdTIIqMgPjFE1OzbPMKGCtWB+tfV7jYikDrMnL8fifN9PhsNCU2JP8YlUpl/WOXxKKV+OIxrQb2n/YDo8mac6yevgTGgAYlFwMxyc/1P/lSg/eOJyJFz387au4tw+aO7nCWCTq4TN4CLR63bXT9iw8THQm7OWPAqNNNbayzK9Rz/wLn6iFgZFkKc/2TdSwqbYwC+9qfgwdDfeBFH25ZHZsqHkBIcRHfDOElNayLO4xW/glWafZvWDoGBieYJTWdy8nxX2dKagghJCecTzloRn/NaxMlJ0Hx+xzAYL0lkp1lItRzz77OhkZtSP7QHQMpdaTZp6EFtgNE9EmPpdEZOUy6oMLn51UwZBs3HuVvh40d3GEnobcoQgjJjbFhS+h6vctrEz+8Ddmm5QJFVDRaZ1tnc5YKCJB8orsGUrpkpEEcGsJrompolidWbUSfw/H4+346r50B8F0TJ/zc3AUS0Zv3syKEEFun6z8KhjpNeG2yozshORcsUERvjwoIkMKixIaUKpKrZ8Dos4bgWa+yYCvXEDAikt35aC3a7InFhRj+J3B2EgZr27thYm3HfG3SSwghBIBIDM0nM8B6+PCa5f/+BNGzhwIF9Xaaecsgfu3PwINkA2IyaD8b8maU2JBSxWxTzqbtAbpJtggsx+GX0FT0OhSPqAyWd6yigxjBPTzRq6JSoOgIIcSKOThDM/FbcNKsfb4YnRaK32YB6akCBlY4TjIR6mdbZ0PV0Uh+UGJDSg0mJQniu1d5bTQNzTIkaowYejQBc66mwJhtPU3bsnKc6OWJ2m5UjpsQQgqL9asO7cgveG2iuJdQrJwHsGwuZ1kumo5GCoMSG1JqSC6dBPPam7exXGWw5SoJGBEBgNAUEYL2xCE40vyP0hd1HLC9kzvcFFTcgRBC3pahdTfo2/E3NpfcDIFs9xqBIiq8VmX4ic35GCogQN6MEhtSaphtykmjNYLiOA7L7qRh9C05ItP5c6Nd5Qy2dnTH7MbOVCSAEEKKkHb4BBirBPDaZLvWQHzjgkARFU5TLxlE2dbZxKlpnQ3JGyU2pFRgEmIgDr/NazM0ay9QNESlZTHieCK+uZQMI8dPXJp4SnG6txc6l1cIFB0hhJRiUhk0E+aCdXLlNStWzgMT80KgoArOSSZC3WxTlLMXnSEkO0psSKkguXiC99hYpSY4zzICRWPbbibo0HZvLPY905gdG1/LAfu7eaK8g0SAyAghxDZwbl7QjpsFTpR1m8dkpEGxdBagNX9vtlQtfGS8x+djaJ0NyRslNqRUkFw4wntM09BKHsdx+DcsHZ33x+FJKn+6gJOMwfr2bpgX6AyZmKaeEUJIcTMGNIBu8Bhem/j5I8j/WQxwXC5nWZYW3tnW2dBGneQNKLEhVk8UGQHxs0emxxwjgiGwrXAB2aB0PYtPziTh8/MqaLNNgQ5wMOJ0by/0pFLOhBBSovRdB0Mf2I7XJr1wFNKjOwWKqGCae/NHbG4l6pGss74Kb6TkUGJDrF72ogHGmg3BubgLFI3teaDSo+O+OGx5pDY79mENe/xVVws/R5p6RgghJY5hoP1wCoxl/XjNsk3LIHoQKkxMBeCuEKOGS9bfDw7ARVpnQ/JAiQ2xbhwHyYWjvCZDi44CBWN7tj/OQLu9cbinMvDa7SUM/gpyxU/NXSCndxlCCBGOwg6aT78Fp7AzNTFGIxS/zwGjShAwsPzJPh3tAq2zIXmgWw5i1UQP70AUH216zEllMDRqLWBEtkFr5DD5ggofnUpCuoE/V7uGiwTHe3liUBW7XM4mhBBSkrgyFaAZ/RWvTZScCMWyOYDBkPNJFsK8gACN2JDcUWJDrJrZaE39FoDSXqBobENEigFdD8Rh1f10s2ODqyhxrKcn/F2kOZxJCCFEKMZGraHrOZzXJn5wC7L/VgkUUf40zzZicy1ehwwDrbMhOaPEhlgvgwHSS/wyz4bmVA2tOO2MyECbPbG4Hq/ntctEwC8tXLCytSvspfS2Qgghlkg34AMYajXmtckObIb4ZohAEb2Zr70Yfo5i02M9C1yJ0+dxBrFldAdCrJb4zlUwqcmmx5ydA4x1mwoYUemlNnD48rwKo04mIUXPn3pW0UGM4B6eeN/fHgxDpZwJIcRiicTQjJ0B1s2L16z4cz6YxFiBgnqz7KM2tM6G5IYSG2K1zPauadIGkMpy6U0KKzxZj477YrE6zHzqWc8KCpzq7YX6HvRzJ4QQq+DoAs3YGfzNO9NSoPhjHmC0zPU2LbKVfab9bEhuKLEh1kmrhuTaWV6ToTlVQytqWx5loO2eONxJ4v+xk4mAH5s6Y117N7hQ2TNCCLEqbPW60PX/kNcmDrsJ2e51AkWUt5Y+/BGbS7E66IzWsckoKVl0R0KskuTaeTBajekx6+oBo389ASMqXeI1Rrx/IhFjTptXPavkmDn1bHRNB5p6RgghVkrfY5jZehvpnrUQ370mUES5q+Qoho8y65ZVbeRwM4HW2RBzlNgQqyQJyVYNrVkHQES/zkVh71M1mu2Mxa4n5htu9q+kpKlnhBBSGohE0I75Gqyzm6mJ4TjI//geTHKigIGZYxgGLXxonQ15M7oTJNYnVQXxrUu8JpqG9vaepRkw/FgCRhxPRLyGX0pTIQZ+beGCv9u4wklGbxuEEFIacM5u0I75Btxro++i5ETIV84HWMsqqdw82zqbc7SfDckB3aEQqyO5fAqM0Wh6zJatCLZCVQEjsm46I4cloalouiMW+59pzI7Xd5fiRC8vjKSqZ4QQUuoYazWCvvcIXpvkzhVI928SKKKctcihMpqRpXU2hI8SG2J1pNk25dQ36wDQDXehnHqpRavdsZh7NQXqbAsxpSLgmwaOONLTEwGutOEmIYSUVro+75mtU5Xt+Buih3cEishcgKsELrKsv/UpOg53VZZZxY0IhxIbYlWY+GiIH9zitdE0tIKLzjDio1OJ6HM4Hg+Szf8wNPaU4ngvL0yp7wSpiJJGQggp1cQSaD6ZAc7BydTEsCwUf3wPqM1L/QtBxDDm+9lE0zobwkeJDbEqkpBjvMfGKjXBeZUVKBrrY2A5/HE3DU12xGD7Y/PiAK5yBr+1dEFwD0/UcaNRGkIIsRWcmyc0o7/mtYnioiBf+4swAeWghU+2/WxonQ3JhhIbYlUkF/iJDY3W5N/lWB3a7Y3D9IvJSNWbz0t+r7odrvT3xnvV7SGiqX2EEGJzjPWaQddpAK9Nev4IJOeP5HJGycq+zuZ8jBYcR+tsSBZKbIjVED1/DHHkY9NjTiSCoWk7ASOyDklaFp+fS0Ln/XG4lWhe97+umxRHenjit5aucFeIBYiQEEKIpdANHg1jucq8NvmaJWDiogSKKEtddynsJVkfvMWqWTxKoXU2JAslNsRqSM4H8x4bazUG5+QqUDSWj+M4bAhPR+P/YvDvgwxk/0zLScrgh6bOON7LE028aF8aQgghAGRyaMfOACfN+rvAaDIy19sYhU0ipCIGgV40HY3kTvDEZtWqVahbty68vb3Rpk0bnD9/Pte+T58+hYuLi9nX0aP8Kllnz55FmzZt4O3tjXr16mH16tXF/TJIcTMaIDnHT2xoGlruojKMGHI0AePPqpCgNd+LYFBlJS7198YnNR0goeIAhBBCXsOWqwzd0LG8NvHDO5DtWSdQRFlaZNvP5jwVECCvETSx2bFjB6ZPn45Jkybh9OnTCAwMxKBBg/D8+fM8z/vvv/8QFhZm+goKCjIde/LkCQYPHozAwECcPn0aX375JaZOnYrdu3cX98shxUh8+wpEr+2EzCnsYGjcWsCILBPHcdj+OAPNd8YgONL8zb6aswS7u7jjrzZu8LGjaWeEEEL+1959x1VV/38Af527GSpDBDdqimQq7oE5cGSWs3BkpegvM0fbWd/MzJSwzJGlqZUlppWZKWJqqKSiX0dKmog4UdlTxp3n94dfLxwuEPty4fV8PHjked8z3pcDdN73swqnHzgKBp9ekpjy1+8gu3LBShk90Nuj4DgbtthQHqsWNp9//jmee+45TJo0CV5eXggKCoK7u/u/trC4uLjA3d3d/KVS5VXvX3/9NTw8PBAUFAQvLy9MmjQJEyZMwNq1ayv77VAlUoSHSrYN3fsDajvrJFNN3debMD08Ff93JBVpOmnHM40ceLdzXfw5sgH6NdJYKUMiIrIZggDt1Lkw1cvr8i2IJmjWLwWyMq2WVpf6KqjyPb3eum/E7fscZ0MPWK2w0el0+Ouvv+Dn5yeJ+/n54eTJk8Ue+8ILL+CRRx7BE088YdESc+rUKYtzDhw4EOfOnYNebzlwmmzA/Qwozh2ThPSPD7VSMtVTZIoe/XcnYnuM5RTOvdxVOD7KHW93rAO1nN3OiIioZMS6ztC+tEASkyXFP5gC2kqzkWkUArq4Sbuj/RnHVht6QGGtCycnJ8NoNMLNzU0Sd3NzQ0JCQqHHODo6YsmSJejZsycUCgVCQkIQEBCAL774AuPGjQMAJCQkoH///hbnNBgMSE5OhoeHR6Hnjo6OLv+boiKV5/tb/3QYHA15RanW2Q1R0AC8ZwCAX+Pk+DhGBZ0oLVpUgohXmusxoXE2jPFpiI63UoLg71dtx/tfu/H+2ziNMxp3H4QGp/LGMysjDuGORwukPtbjXw+vjPv/qEqJE8hbay0kKhFdRRY31VFl3P/WrVsX+ZrVCpuHhALrZYiiaBF7yNXVFbNnzzZvd+rUCSkpKVi1apW5sCnqnIXF8yvum0TlEx0dXa7vr93WT6SBAcPRuk2bcmZl+3RGEfNPpmPzVctVob2dFNjc3wXeztZfZLO8959sG+9/7cb7X0N4zoUx7jrkt2LMoea/b4Nb3yEQXRsUeVhl3f+Rjlpsup1k3j6frUbr1s0r/DpUPtb4/bdaVzRXV1fI5XKL1pmkpCSLVpzidOnSBdeu5a1t0qBBg0LPqVAo4OLiUr6kqcrJYq9Dfv2yeVsUBBj6PGHFjKqH+GwjRoQmYXOUZVHzYht7HBruVi2KGiIiqgGUKuRO/w9EZd7/V4TsLKg3LgdMljNvVrZubpbjbG5mcpwNWbGwUalU8PHxQVhYmCQeFhaGHj3+vWnzocjISLi7u5u3u3fvjsOHD1ucs1OnTlAq+aBnaxTH9ku2jd6dILq6F7F37fB3ih4D9yQiIkHa7G6vEPBVX2es9nWGvcLqM7kTEVENIjb2hG7sy5KY4tJZKA/8XOW52CkEdLUYZ8Npn8nKs6LNnDkTwcHB2LJlC6KiojBv3jzExcUhICAAALB48WKMGDHCvH9wcDB+/PFHREVFITo6GmvWrMHGjRsxbdo08z4BAQG4e/cu5s+fj6ioKGzZsgXBwcGYNWtWlb8/KieDwaKwMfSp3ZMGHIzNxZMhiYjNMkrizR3l+P0pN/i3srdSZkREVNPpB42B4dHOkpjqxw2QxV6v8lwebyid9jn8HgsbsvIYmzFjxiAlJQVBQUGIj4+Ht7c3duzYgWbNmgEA4uLicP269JdlxYoVuH37NuRyOVq1aoW1a9dKxtd4enpix44dWLhwITZv3gwPDw8EBgZi5MiRVfreqPzkfx2HLD3VvF3b167ZfDkLcyLSYCwwEc2ARmps6ucMFw3XpSEiokokk0H70nzI35kCIfs+AEDQ66FevxQ5i74AFFXXM6aPhxqByJt2+s84XbHjtKl2ENLS0qwzXx/VGmUdPKZZMQeKyP+at/V+I6Gd9EZFpmYTRFFE0PlMfHTOct2Al9o6YFmPelDIqu8fcg4ert14/2s33v+aSXHiEDRfLpHEdE9PhM7/JUmsMu9/rkFE8+C70ObrwPDXs+7wrGP1ebHof2rV5AFExRES70H+92lJTN//aStlYz0mUcTCU+kWRY0AYFn3evi4Z/UuaoiIqOYx9BoIfQ/pmoHKvdsgu3KhynLQKAR0KzDOht3RiIUNVUvKI3sh5Fv8y9iiLUzNa9enfgaTiFl/puGLS9KZz+zkArYOdMEr7RzZ5E5ERFahffF1mJzrm7cF0QTNhmVATnaV5WAxzoYTCNR6LGyo+jEYoAjfJwnVttaaXIOISWEpCL4q/R9EXZWAX55wxbBmdlbKjIiICIBjXWj/b74kJEu8B3Xw2ipLoY+HtLA5dk9nXruQaicWNlTtyM+fgCwt2bwtauxgKNDkXZNl6k0YezAZe2/lSuIN7GTY+6QberqriziSiIio6hgf6wrd4GckMeXREMjPhFfJ9bu6qZB/3pw72UZczzQWfQDVeCxsqNpRHt4j2Tb0HATY1Y5pjFO1JowKTcLRAv2EmzrKse9JN7R34VpMRERUfejGToOpYTNJTPP1CgjpKZV+bbVcQPcG0g/7jtxld7TajIUNVStCUhzkkacksdrSDS0l14gRoUk4k6SXxL3qKbB/mBta1eNML0REVM2o1Mh9+R2I8rymEyEzHerNQUAVdAvr30ha2PxxN7eIPak2YGFD1Yryj1+lkwY0bwNTCy8rZlQ1knONGB6ahMgUaVHTub4SIcPqo5ED16ghIqLqydTCC7pRkyUxxV8n4Hqu8ruk+RUobI7c08Jg4jib2oqFDVUfOi2Uh/dKQvoBw62UTNVJ+l9RczHVIIn38VDh16H14cqFN4mIqJrTPzUBxkfaSWKND+yAEB9bqdft4KqEizrvcTZDJ+Jskq5Sr0nVFwsbqjYUJw5CyMowb4sOdWDoPdiKGVW+xBwjRuxLwqUCRU3fhmrsGOyKOkr+ihIRkQ2QK5A7bSFEtSYvpNc+mALaaCjmwPKRCQIGFGi1OXSH42xqKz41UfUgilAe3CkJ6fs9BeT7A1nTJOQ8aKm5lCb9g9+/kRo/DHKBvYK/nkREZDtE98bQPjdLEpNfvQjl3m2Vel2/xtLCJoyFTa3FJyeqFmRXIiG/FWPeFgUZ9H4jrZhR5YrPNmL4viRcLlDU+DVSY9tAVxY1RERkkwz9noLBp7ckptr1DWTXoyrtmgMaST8EPZ2kQ5rWVGnXo+qLT09ULagO/CzZNnbqDdGtoZWyqVz3so14OjQJUenSomZQYzWCB7rCTiFYKTMiIqJyEgRop7wNUx2nvJDRCM36pYCuclpSGjnI4e2UN3OoSYTFsglUO7CwIasTkhMsFvPSD3mmiL1tW+x9A54KSUR0gaJmSBM1vvdzhYZFDRER2Tixngu0U96WxGT3bkG1Y32lXXNAwe5onPa5VmJhQ1an/ONXCKa8JmNjkxYwtvWxXkKV5GamAU/tS8K1AqsiP9FUg+9Y1BARUQ1i7NwHST59JDHVgZ0Wa9VVlIGNpd3RDt3RQqyCdXSoemFhQ9aVkw3lH79KQvpBYwChZj3k3/hfUXPzvrSoeaqZBt8NcIFaXrPeLxER0Z3B42Aq0K1c/dVyIDOtwq/Vy10Fdb7VEW7dN+J6gQ8SqeZjYUNWpTyyB0L2ffO26FAXht6DrJhRxYtJN2BYSCJis6R/YEd6avDNABeoWNQQEVENZFJrHkwBLeQ9bsrSU6DZvAKo4NYUe4UMvdyl3dEOxrI7Wm3Dwoasx6CHMnSHJKQfPBpQ21kpoYp3JU2Pp/Yl4m62dHaWZ1rYYVM/FyhlLGqIiKjmMrVpD/2I5yUxxdk/oQjfV+HX8iuwnk3obRY2tQ0LG7IaxYmDkKUmmbdFlRq6QaOtmFHFupCsw1P7khCXIy1qxrWyw/q+zlCwqCEiolpAN+JFGFu0lcTU36+GEB9bodcZ2lQ6ziY8Tot0Had9rk1Y2JB1mExQhmyXhPR9hwH5poe0ZX/GafH0viQk5kr/oE5sbY91fVjUEBFRLaJQIHf6OxBVeYWHoM19MAW00VDMgaXTxkmJR+rmTfusNwGH2B2tVmFhQ1YhPx8B+d0b5m1RJoN+6FjrJVSB9tzMwTO/JyFDL+0/PLmNPdb4OkHOooaIiGoZ0aMptBNnSWLymH+g3P19hV5nWDNpq00Iu6PVKixsyCpUe4Ml24YefjViQc4tV7LwYlgKtAUmYpnRzgGf9naCrIbN9kZERFRShn5PwdDJVxJT7d4C2dWLFXaNgoXN77G50Js47XNtwcKGqpz8n3OQR/8tiemfHGelbCqGKIr47EImXj2WhoJ/P9/vUhdLu9VjUUNERLWbICB3yhyY6jnnhUymB13ScrIr5BLd3FSor8l7vM3QiTgep62Qc1P1x8KGqpYoQvXLN5KQoX03mJq3tk4+FcAkinj3vxl4/0yGJC4TgNW+Tni9Qx0ILGqIiIiAuk7QTp0nCckS7kIdvLZCTi+XCXiiwCQCe2+xO1ptwcKGqpT8n3OQR52XxHQjJ1kpm/LTm0S8Ep6Kzy/el8TVcmDLABe82MbBSpkRERFVT8aOPaEbOEoSUx4Ngfx0eIWcf1iBwibkVi7ECl43h6onFjZUdUQRql++loQM7bvB1PoxKyVUPtkGE54/lIztMTmSeF2lgJ+H1MfTzWvOejxEREQVSTduOkwNm0lims1BEPItA1FWAxqrYZdv8evYLCMiU/TlPi9VfyxsqMrIL52B/EqkJKYbHWClbMonTWvC6P3J2B8r7bfbwE6GPU/WRx8PdRFHEhEREdQa5E5/F6I8b3pmISsD6o2BgKl8a8/YK2ToX2CxzhB2R6sVWNhQ1RBFqHYWaK3p0AOmVo9aKaGyu5tlxLCQRJxM0EninnXk2D/MDR1cVVbKjIiIyHaYPNtA98wUSUzx93+hPPhLuc9dcHa0X67nsDtaLcDChqqE/O//Ql5gOkfdqMnWSaYcrqbr8URIIi6lSRcUe8xFidBhbmiRb2EwIiIiKp7+yXEwenWUxFQ7voTsVky5zjusmQaKfPP2RKUb2B2tFmBhQ5VPNEG1Y4MkZOjYE6ZW3lZKqGz+StJhaEgSbt+XLlLTy12FPUPrw8NebqXMiIiIbJRMjtxpCyDa5022I+j10KxbDOSWfQpoV40cAxtLu6P9dC2niL2ppmBhQ5XOOfIk5LeuSmK2NrYmIl6LEaFJSMqV9vt9sqkGO4fUh5Oav0pERERlIdb3gPbFNyUx2b1bUH+3qlzn9W9lL9n++VoOTOyOVqNZ/Wls48aN6NChA9zd3dGvXz8cP368yH3Dw8MxYcIEeHl5oWHDhujduze+++47i32cnJwsvq5cuVLZb4UKo9Oi0eFdkpC+hx9MLbysk08ZHLmrxZjfk5Ghl/4xnNjaHt/5ucBOwTVqiIiIysPQayD0fYdJYso/90PxZ2iZz/lkUw0c8v0/+k62EcfjdcUcQbbOqoXNzp07MX/+fLz11ls4evQounfvDn9/f9y+fbvQ/U+dOoV27drh22+/xYkTJzB16lS8/vrr+PHHHy32jYiIQFRUlPmrVatWlf12qBDKAz9DlZFi3hblCuj8X7JiRqVzMDYX4w4mIdsgLWpee8wRa32doJCxqCEiIqoI2udnw9jIUxJTf/sZhLs3y3Q+B6UMTxWYRODHmLJ3b6Pqz6qFzeeff47nnnsOkyZNgpeXF4KCguDu7o7NmzcXuv9bb72Fd999Fz179oSnpyemTp2K4cOHY/fu3Rb7urm5wd3d3fwll3P8Q5XLTIPqt62SkH7QaIhuDa2UUOnsuZmDCYeSkSsdUoN3O9fF4m71IAgsaoiIiCqM2g7amYsgqvLGxgi6XGg+XwzotMUcWLRnW0q7o+26kQOtkd3RaiqrFTY6nQ5//fUX/Pz8JHE/Pz+cPHmyxOfJzMyEk5OTRbx///7w8vLCiBEjcPTo0fKmS2Wg2v0dhJws87Zo7wjdiBesmFHJ7byWjUlhKdAXmEr/w2518XbHOtZJioiIqIYzNWkB7fOvSmLy2GtQB39epvMNaKyGa75xsOk6EQdjuaZNTWW1uWmTk5NhNBrh5uYmibu5uSEhIaFE5wgNDcWRI0ewf/9+c8zDwwOffvopOnfuDJ1Oh+3bt2PkyJHYs2cPfH19izxXdHR02d4IFUqTeBdtD+6SxO72GoqEe/EA4q2SU0n9Fi/Hh9EqmCBtkZnXSocn1HGIjo6zUma2i79ftRvvf+3G+1+7len+N2yN5u26w+XiKXNIGbYbsc4eSHu0W6lPN8BFiZ/uKc3bm88noI2OY22qQmX8/rdu3brI16y+6EbB7jyiKJaoi09ERAReeuklBAYGokuXLuZ469atJW+4e/fuuHXrFtasWVNsYVPcN4lKSRSh2fkFBFNeHy5TfXfUG/8S6qnUxRxofcHRWfggOk0SEwCs6eOE51s7FHoMFS86Opq/X7UY73/txvtfu5Xr/s9eBNOiaZDF3zGHPPdtRXbPfhDdG5fqVC/V1eKne0nm7aOpCjg1aQI3Ow5TqEzW+P23Wlc0V1dXyOVyi9aZpKQki1acgk6cOAF/f38sWLAAU6dO/ddrdenSBdeuXStXvlRyilNhUFw6K4lpx88AqnlRs/NaNmYdS5PE5ALwVT9nFjVERERVyc4BuTMWQVTktbQIOVkP1rcp5Xib7g1UaFU3r4jRm4DvozmJQE1ktcJGpVLBx8cHYWFhknhYWBh69OhR5HHHjh2Dv78/5s6dixkzZpToWpGRkXB3dy9XvlRCudlQbVsnCWW0eBTGrn2tlFDJ7L2Zg2lHU2HKN55QKQO+GeBiMfCQiIiIKp/Jsw1041+RxOQ3rkD9/epSnUcQBEz2kn5A+XVUFte0qYGsOivazJkzERwcjC1btiAqKgrz5s1DXFwcAgIeLN64ePFijBgxwrx/eHg4/P39ERAQgLFjxyI+Ph7x8fFISsprXly3bh327NmDmJgY/PPPP1i8eDH27t2Ll16ynSmGbZnq1+8gS827H6JcgdihE4BqPIPY4bu5CDicgvwzOssFYHN/Fwxvbme9xIiIiGo5/aDRMHR5XBJTHtkLxeE9pTrPxEfsoc7X8+zWfSMO3SnbTGtUfVl1jM2YMWOQkpKCoKAgxMfHw9vbGzt27ECzZs0AAHFxcbh+/bp5/+DgYGRnZ2PNmjVYs2aNOd60aVNERkYCAPR6Pf7zn//g3r170Gg05nMOGTKkat9cLSTcuQHl/h2SmH6oP7SuHlbK6N+dT9bh+UMp0OWb/UwAsL6vM4saIiIiaxME5E6dC/vYa5LxNurvVsHU7BGYWrYt0WlcNHKM9rTDDzE55timy1kY3ERTzFFka4S0tDS2w1H5mYywW/oq5Fcv5oVc3JC97FtE375TLQeP3sw0YMjeRMTnSOd0XuPrhBfacExNReHg4dqN97924/2v3Sry/stir8Fu8QwIurypmk0uDZC9eANQ16lE5/hvgg6D9ybmnVMAzj/rjqaOVp9Lq0aqVZMHUM2iPLBTUtQAgHbCTEBTPcenpOQa8eyBZIuiZmn3eixqiIiIqhlTk5bQTp0jiclSEmC35j3AoC/RObq6KfGYS95kBCYR+DaKkwjUJCxsqNyEhLtQ/bRJEjN08oWxWz8rZVS8HIOICYdSEJ1ukMRntnPEzHaOVsqKiIiIimPoORC6Ic9KYvIrF6De8hlQgokABEHA1AKTCGyJzkKugZ2XagoWNlQ+ogj11yskTcOivQO0k96olhMGGE0iXjqSgpMJ0oW5xrSww5Juda2UFREREZWEbtx0GLw7SWLKI3uhPLCzRMf7t7JDHWXe80lCjgk/xLDVpqZgYUPlojiyt9A1a0Tn+lbKqGiiKGL+yXTsuZUriffxUOGLx50hq4aFGBEREeWjUCB31vswuTWShFXBn0Me+d9/PdxRKcMLbaTd5FdFZsJgYqtNTcDChspMiI+FOnitJGZo1wWGvsOslFHxVkXex1eXsySxR50U+N7PFWo5ixoiIiKb4FgPua8vhZhvHK8gmqD5/H3Ibv/7guwz29WBMt8T8PVMI3bdyCn6ALIZLGyobAx6aL5YAkGbrwuaWgNtwNvVsgvaD1ez8f6ZDEmssb0cPw6pDyc1fw2IiIhsialJC+S+8h+I+Z45hJwsaD6ZCyElodhjGzvIMb6VtNVm5YVMiFyw0+bxiY7KRLXza8ivR0li2gkzIbo1tFJGRQu7k4tZf6ZKYnVVAn4c4orGDvIijiIiIqLqzOjTC7px0yUxWWoSNJ/MB7LvF3vsa+0dkf9j2IupBvweywU7bR0LGyo1+T/noAzZJokZujwOQ/+nrZRR0S4k6/BiWAryT3iikgFb/VzxqLOy6AOJiIio2tMPHQvdoNGSmDz2GjRr3gP0uiKOAh6pp8RIT+lC3CsvZFZKjlR1WNhQ6WSkQb1+KYR8zbUm5/rInVL9uqDdum+A/4FkZOqlTctfPu6MxxuqrZQVERERVRhBgG7iLBg695GEFZfOQvPFEsBoKOJA4I0O0iUeIhJ0+ONObhF7ky1gYUMlZzRA88UHkKUmmUOiIED78juAYz0rJmYpVWvCs78XvgDnmJbVc9FQIiIiKgOZHLnT34Wx1aOSsOJMONRfLQdMxkIP6+iqwqDG0g863zudASNnSLNZLGyoxFQ7NlhM7awfNh7GAvPJW1uOQcSEg8m4UmABzhntHLgAJxERUU2k1iDnjY9gathUElaeOAj1NyuLXMDznc7SNez+TtFjO9e1sVksbKhEFBGHoArdIYkZ23aEbsxUK2VUOKNJxLSjKYgosADnaE87fNiterUqERERUQWq44ScuZ/CVGAiI+WRPVB9twowmSwO6VRfhWdbSsfaLD2biRwDW21sEQsb+leyWzFQbwqSxEzO9ZE7YxGgUFgpK0smUcSsY2n47aa0f6yvhwpf9uUCnERERDWd6OKGnHmfwlRgoXDVoV1Qf72i0G5p73auC1W+J+I72UZ8ean4WdWoemJhQ8USkhOg+XQeBF2+9WoUSuTOXgKxnosVM5MyiSLeOJ6GbVelzcfeTgps5QKcREREtYbo1hA5cz+BqY6TJK48GgL1+o8Ag7SrumcdBV7ylnZVX3khE0m5hY/NoeqLhQ0VLSsTmk/mSiYLAADtC6/B1MrbSklZEkUR806m49sr0qKmiYMcPw525QKcREREtYzYqDly538KUz1nSVwZcQiatYsArbR3x9sd66CeKu9D0Ay9iIUn06skV6o4fOKjwum0sFv9LuR3bkjDA0dVq/VqdEYRM/9Mw1f/ZEniDe1l+G1ofTRxrD5d5YiIiKjqmJq0RM7C1TC5uEniinPHYLfsdQhpyeaYs1qGtzvWkey341oO9t/m9M+2hIUNWTIaoNnwEeSXz0vChi6PQ/f8bCslZSldZ8LYg8kILtD9rIGdDLuH1keLuixqiIiIajPRo+mD4satkSQuv34ZdktmQBZ73Ryb/qgj2jlLnx3ePJ6GDJ3lpANUPbGwISmDAZp1H0Dx3yOSsLH1Y8id/i4gk5f4VGlaE8Lu5OK7WAXWRGZie0w2Dt/NxY3MohfLKqmr6Xo8uTcRh+9qJXEXtQy7nqiP1vWU5b4GERER2T7RrSFyFq6CsUkLSVyWFA+7D2dB/tdxAIBSJuDzPs6Q5RuWeyfbiMVnMqoyXSoHfqRNefQ6aNYthuLsMUnY1LAZcl7/CFCpizgwT7rOhC8v3cePMTm4mvGwgFEBN6R/FFrVleOJphoMbWqH3u4qKGQlG9yvN4lYHXkfH5/PgLbAmD7POg/G1LCoISIiovxEFzfkvLsWmrXvQ/H3f81xIScLdisXQjdsAnTPTIVPfRVmt3PEqr/zZkXbdDkLI5rboV+jf38OIutiiw09oNNCs+Y9y6LGrSFy3v4YcKxbxIEPZOpNWHE+Ex1+jMOyc5n5iprCxWQYse5iFkaEJqHt9ji8dSIN4fe00Bex2m+GzoRtV7PRb3cClpy1LGq6uSlx8Gk3FjVERERUODsH5L6xDPoBwy1eUoVsg13gGxBSEjG/U120rCPtoTL1SAruZHGWtOqOLTYEIS0Zms/egfz6ZUnc1KARcuZ/BtG1QbHHh93JxSvhqYjLKVsf1KRcEzZdzsKmy1nQyIEOLip0rK+EWiYg2yDiXrYRYXdzUdSsiyM9NfjycRfYKTilMxERERVDoYB20psweTSDavsXEPIt2im/Egn7dwIgnzgLq3374+nQvMkFknJNePGPZIQMc+MSEtUYC5taTnYzGpqVCyymdDZ5NEXOvE8hFphJJD+tUcQHZzLw+cXCF7GSCQ/WkXlElQsPl3pIyDHhXrYRZ5J00BdRA+UagVOJOpxK1P1r7q5qGT7qUQ9jW9pB4OKbREREVBKCAP1QfxhbtoVm3WLJM5CQfR+ar5ZjUMfDWNp1Gt6Jyet+diZJjzkRaVjt61zYWakaYGFTiylOHIJ6c5Bk8U0AMDb2RO7cTyA6uRZ5bEy6AZMPpyAyRW/xWl2VgNntHDHtUUfUU8kQHR2N1q2dzK9n6EwIu6tFyK0c7Ludiwxd4d3PijOulR0+6l4PrpqST2ZARERE9JCpTXtkf7ARmvVLJeNuAEBxPgJzoy6ggfcYzHIcCK1cBQDYciUbbZ2UmNHOsbBTkpWxsKmN7mdAvWUllCfDLF4ytO+O3BnvAfZF/8L+djMHM8NTkaGXFiQyAZjdzhFvdKhT7KKYdVUyjPS0w0hPO2iNIv64k4tfrufg6D1tsd3ZmjnK8WxLO/i3tIe3M8fSEBERUTnVdULuW4FQ/v4TVD9thKDP6zEi5GYj4Nz3GGT/O972HI+f3boDgoCFpx4s3MnipvphYVPLyM9HQL05CLJ8i1I9pBvyLHTjpwPywn8sDCYRS85kSGYKeaiJgxwb+jqjt0fpZgxRywU82cwOTzazAwDczXrQVe1KmgFKGWCvEGCvENDWSYlO9ZXsckZEREQVSyaDfuhYGHx6Q7MpEPIrkZKXm2YnYPul1Tjn2ByBzUZgp1t3LDyVDqMoYvZjdYo4KVkDC5taQhZ7Dart66G4cNLiNVGhhPaF12Do/3SRx9/INODlo6k4mWA59mW0px1W9nYqtpWmpBo5yNHIwQ5oXu5TEREREZWY6NEEOQtWQXloF1Q7N0PIln6Q2+n+TfxwaQ2u2HngsyZP4mNDb6RqTVjYqW6Jl62gysXCpoYT4mOh2rsNiqP7IIiW3byMzR6B9uWFMDVpWejxoihi29VszDuZjswCXc8UArC0ez1M83ZgSwoRERHZPpkM+sFjoO81EKpdW6D8YxcEo3Ra1jY5cVgX/TWCYoKx/WpPzPt7MF4b0wPN6rCbvLWxsKmJTCbIL56B8sDPkF84CUG0HJwvCjLon34OulGTAEXhv4gXU/R473Q6Dt3RWrzW0F6Gb/q7oIc7F6siIiKiGsaxHnTPz4Z+4Eiof94E+emjFs9TDiYtpsQdwZS4I7hzzAXX2vui5UA/oE17QMFHbGvgd72mMBkhuxIJxelwKM4chSwlschdjV4doZ3wCkwt2hb6+s1MA1acz8TWq9kobL3MJ5qosbaPM9zsOCMZERER1Vxiw2bInbUYwt2bUIX8AMXx3y1acACgsTYFjU//Bpz+DTqVHQSvDjA92glGrw4wNW0FqPhBcFVgYWOrsjIhi70OeXQk5NF/Qx79N4SszGIPMTVsCu3Yl2Hs5AsU6Dp2X29C6O1cfHclG0fuWbbQAICdXMDS7vUQ4GXPrmdERERUa4iNmkP7f/OgGx0A5ZE9kB0JgSItqdB9VbocIPLkgy8AolwOU+MWMHm2gamxJ0wNm8Hk0RSiqztbdiqY1b+bGzduxOrVqxEfH4+2bdti2bJl6N27d5H7X7x4EXPmzMHZs2fh7OyMyZMnY+7cuZIH7T///BPvvPMOLl++DA8PD7z22muYMmVKVbyd8jGZgNxsCDlZEHKygOwH/xXSkiGkJEKWmggh4S5kd29Clp5S4tMaPdtAP/gZGHoOhEEmR1yWETfvG3Ez04C/U/U4Ea/DhWQ9jMUsJ+ProcLKXk5o48T+o0RERFQ7ia4NoBszBRj5ImQXTuF2yB40uXoGDqbCPxQGAMFohPzWVchvXZWeSxAg1nGC6FwfopMrRKf6MD38t0MdwM4eosb+wX/tHB78W2NX5Oy1ZOXCZufOnZg/fz4++eQT9OzZExs3boS/vz8iIiLQtGlTi/0zMjIwevRo9O7dG3/88Qeio6Mxc+ZM2NvbY/bs2QCAGzduYOzYsZg4cSI2bNiAiIgIvPXWW3B1dcXIkSOr+i2WiJCeAvt5LzwoagoZD1MWokoDQ2df6AePwS33Nhi6Lxlp2xJw31C687eup8AHXetiaFMNW2mIiIiIAECugKlTbzTu1Bu3Uu7j0L4/4Xz+TwxKuQB3fUaJTiGIIoSMVCAjFbgZXeJLGwUZDDIFjDI59DIFDDIFnOyUEJRKiHIloJADcuWDBQYh/K+XjgAIyPfv/20DEM3b/wtY7PPg37qnnoOpbccS52kNQlpaWsU8SZfBwIED0a5dO6xevdoc69y5M0aOHIlFixZZ7L9p0ya8//77uHLlCuzsHqx7EhQUhM2bN+PSpUsQBAGLFi3Cb7/9hrNnz5qPmz17Ni5fvowDBw5U/psiIiIiIqIqV/6FR8pIp9Phr7/+gp+fnyTu5+eHkyct11oBgFOnTqFXr17mogZ4UBzdu3cPN2/eNO9T8JwDBw7EuXPnoNfrK/hdEBERERFRdWC1wiY5ORlGoxFubm6SuJubGxISEgo9JiEhodD9H75W3D4GgwHJyckVlT4REREREVUjVitsHio4bkMUxWLHchS2f8F4SfYhIiIiIqKaw2qFjaurK+RyuUXrTFJSkkWLy0MNGjQodH8gr+WmqH0UCgVcXFwqKn0iIiIiIqpGrFbYqFQq+Pj4ICwsTBIPCwtDjx49Cj2me/fuOHHiBHJzcyX7N2zYEM2bNzfvc/jwYYtzdurUCUolpyomIiIiIqqJrNoVbebMmQgODsaWLVsQFRWFefPmIS4uDgEBAQCAxYsXY8SIEeb9n332WdjZ2WHGjBm4dOkSdu/ejc8++wwzZswwdzMLCAjA3bt3MX/+fERFRWHLli0IDg7GrFmzrPIeiYiIiIio8lm1sBkzZgyWLVuGoKAgPP7444iIiMCOHTvQrFkzAEBcXByuX79u3r9evXr45ZdfcO/ePQwYMABz5szBzJkzJUWLp6cnduzYgePHj+Pxxx/HihUrEBgYWG3XsLF1GzduRIcOHeDu7o5+/frh+PHjxe5/8eJFDBs2DB4eHvD29kZgYKB5DBTZntLc//DwcEyYMAFeXl5o2LAhevfuje+++64Ks6XKUNq/AQ/FxMSgSZMmaNy4cSVnSJWptPdfFEWsW7cO3bp1Q4MGDeDl5YX333+/apKlClfa+3/o0CEMHjwYTZo0QcuWLTFhwgRcvXq12GOo+jl27BjGjx8Pb29vODk5YevWrf96TFU9/1l1HRuybTt37sS0adMkC6wGBwcXu8Bq165d0bt3b8ydO9e8wOq8efPMC6yS7Sjt/f/kk0+Qk5ODQYMGwcPDA4cOHcLcuXPx5Zdfwt/f3wrvgMqrtD8DD+l0OgwePBju7u44duwY7ty5U4VZU0Upy/1fuHAh9u/fjw8++ADt2rVDeno64uPjMWTIkCrOnsqrtPf/xo0b6NGjB15++WVMnjwZ9+/fx6JFi3Djxg2cO3fOCu+Ayur3339HREQEOnbsiOnTp2PFihWYOHFikftX5fMfCxsqs8pYYJVsR2nvf2EmT54Mo9HIlhsbVdafgQULFiA9PR2+vr6YO3cuCxsbVdr7Hx0djV69euHYsWPw8vKqylSpEpT2/v/6668ICAhAYmIi5HI5AODo0aMYMWIEYmJi4OrqWmW5U8Vp3LgxPv7442ILm6p8/rP6dM9kmyprgVWyDWW5/4XJzMyEk5NTBWdHVaGsPwP79+/H/v37ERgYWNkpUiUqy/0PCQmBp6cnDh48iI4dO6J9+/aYPn06EhMTqyJlqkBluf8+Pj5QKpXYsmULjEYjMjMzsW3bNnTu3JlFTQ1Xlc9/LGyoTCprgVWyDWW5/wWFhobiyJEjmDx5ciVkSJWtLD8DcXFxeO2117B+/XrUqVOnKtKkSlKW+3/jxg3cvn0bO3fuxLp167B+/XpER0dj/PjxMJlMVZE2VZCy3P/mzZvjl19+wbJly9CgQQM0a9YMly5dwvbt26siZbKiqnz+Y2FD5VIZC6yS7Sjt/X8oIiICL730EgIDA9GlS5fKSo+qQGl+BqZNm4YpU6agW7duVZEaVYHS3H+TyQStVov169fD19cXvXv3xvr163HmzBmcPXu2KtKlClaa+x8fH4/Zs2dj/Pjx+OOPP7Bnzx44Ojpi8uTJLGxrgap6/mNhQ2VSWQuskm0oy/1/6MSJE/D398eCBQswderUykyTKlFZfgaOHj2KwMBAuLq6wtXVFbNnz0ZWVhZcXV3xzTffVEHWVFHKcv/d3d2hUCjwyCOPmGOtWrWCQqFAbGxspeZLFass9/+rr76Cvb09PvjgA3Ts2BG+vr7YsGEDjh07VqouzGR7qvL5j4UNlUllLbBKtqEs9x94MEWkv78/5s6dixkzZlR2mlSJyvIzcPz4cYSHh5u/Fi5cCDs7O4SHh2PUqFFVkDVVlLLc/549e8JgMEiWcbhx4wYMBkOxs+hR9VOW+5+Tk2OeNOChh9tssanZqvL5j4UNlVllLLBKtqO09z88PBz+/v4ICAjA2LFjER8fj/j4ePOnNmR7Svsz8Oijj0q+GjZsCJlMhkcffZSTSNig0t7//v37o2PHjpg5cybOnz+P8+fPY+bMmejatSs6depkrbdBZVTa+z9kyBCcP38ey5cvR0xMDP766y/MnDkTTZo0gY+Pj5XeBZXF/fv3ceHCBVy4cAEmkwmxsbG4cOECbt++DcC6z3+KCj0b1SpjxoxBSkoKgoKCEB8fD29v7xItsPr2229jwIABcHJyslhglWxHae9/cHAwsrOzsWbNGqxZs8Ycb9q0KSIjI6s8fyq/0v4MUM1S2vsvk8mwfft2zJs3D0899RQ0Gg0GDBiApUuXQibj56y2prT3v1+/fti4cSNWrVqFNWvWQKPRoGvXrvjpp5/g4OBgrbdBZXDu3DkMHz7cvL1s2TIsW7YMEyZMwBdffGHV5z+uY0NERERERDaPH5EQEREREZHNY2FDREREREQ2j4UNERERERHZPBY2RERERERk81jYEBERERGRzWNhQ0RERERENo+FDRERERER2TwWNkRENdjWrVvh5ORU5FdoaCiio6Ph7u6OqVOnWhyv1WrRtWtX+Pj44MCBA8WeK//XzZs3S5SfyWTCDz/8gEGDBqFFixZo1KgRfHx8EBAQgIMHD1rsn5OTg5UrV6JPnz5o1KgRmjZtikGDBmHjxo0wGAwW+7dv3x7PPPNMode+efMmnJycsHLlSnNs2bJlkvdRv359tG/fHnPnzkVaWlqh5zlx4gQmTZqEtm3bws3NDS1atMCoUaOwdetWGI1G837Ffb9efPHFEn2/HnrjjTfQp08fNG/eHA0bNkT37t3x0UcfITMzs1TnISKqSRTWToCIiCrf/Pnz0aJFC4t4hw4d0KhRI7zxxhtYvnw5JkyYgEGDBplfDwoKwtWrV7Fz50489thjWL9+veT4hQsXwsPDA6+++qokXr9+/RLlNXfuXGzcuBGDBg3CnDlzoNFoEBMTg/379+Pnn3+W5JKYmIjRo0fj4sWLGDlyJKZMmQKDwYADBw7g7bffxm+//YZt27bB3t6+NN+aQgUFBaFu3brIzs7GkSNHsGHDBpw/fx6hoaEQBMG83/Lly7F8+XJ4enri+eefR/PmzZGeno4jR45g1qxZiIuLw1tvvWXev2/fvpg4caLF9Zo2bVqq/M6dO4e+ffuiefPm0Gg0OH/+PFatWoWwsDCEhoZCLpeX/c0TEdkoFjZERLXAwIED0a1btyJff/PNN7Fz5068+eabiIiIgL29Pf755x+sWrUKY8eOhZ+fHwBg3LhxkuM+/PBDeHh4WMRLIiEhAZs2bcK4ceMsCqalS5ciLi5OEpsxYwYuXryIb775BiNHjjTHp02bhnXr1mHhwoV47733sGLFilLnUtCIESPg7u4OAAgICEBAQAB++eUXnD17Fl26dAEA/Prrr1i+fDmefvppbNq0CWq12nz8rFmzcObMGVy+fFly3latWpXpe1XQ4cOHLWKtWrXCO++8g1OnTqFXr17lvgYRka1hVzQiIoJKpcLKlStx+/ZtLF++HCaTCa+99hocHR3x0UcfVco1b968CVEU4evrW+jrHh4e5n+fPn0aBw4cwPjx4yVFzUMzZsyAr68vvvnmG9y9e7fCc+3duzcA4Pr16+bY0qVL4eTkhHXr1kmKmoe6dOlSaOtMZWncuDEAICMjo8quSURUnbDFhoioFsjIyEBycrJF3NXV1fxvX19fPP/881i3bh2ys7Nx6tQprF27tsTdykrrYfer3bt345lnnoGDg0OR++7btw8A8NxzzxW5z4QJE3Ds2DEcPHiw1GNW/s2tW7cAAM7OzgCAmJgYXLlyBRMnTkTdunVLfJ7c3NxC74ODgwM0Gk2pctLr9cjIyIBWq8XFixfxwQcfoG7dusW2zBER1WQsbIiIaoGiBtDHxsbC0dHRvL1kyRKEhoZi48aN6NOnD55//vlKy8nDwwPPPfccgoOD4e3tDV9fX/Ts2RMDBw7EY489Jtn3YZeu9u3bF3m+h69FRUWVO7fU1FQoFApkZWXh6NGj2LhxI9zd3c2tSw+v0a5du1Kdd9u2bdi2bZtF/OOPP8a0adNKda5jx45h1KhR5m0vLy9s27YNLi4upToPEVFNwcKGiKgWCAwMhJeXl0Xczs5Osq1Wq+Hg4IDExETzuJrKtHr1anTs2BFbt25FaGgo9u3bh0WLFqFz585Yv349WrduDQC4f/8+AKBOnTpFnuvhaxUxM1jPnj0l2x06dMDatWvNrSoPr5G/KCyJJ554Aq+88opF/OH7LI2OHTti165dyMzMxMmTJxEWFsZuaERUq7GwISKqBTp37lyiLkqBgYG4ceMGvL29sXLlSowfPx6NGjWqtLwUCgVefvllvPzyy0hPT0dERAR++OEH/PLLLxg/fjyOHz8OtVptLiAyMzPh5ORU6LnKWmzkn+XsoW+++QZOTk5ITEzEhg0bcPPmTck4modF1MOCq6QaNWqE/v37l+qYojg7O5vPNXz4cGzduhUTJ07EkSNHim3ZIiKqqTh5ABERAQD+/vtvrF27FhMnTsS2bdtgMBgwd+7cKrt+vXr18MQTT+Drr7/GuHHjEBMTg9OnTwOAubUpMjKyyOP//vtvAEDbtm3NMY1Gg5ycnEL3fxgvbOB/r1690L9/f/j7+2PXrl2wt7fH//3f/8FkMknyuXTpUmnfZqUZNWoURFHEzp07rZ0KEZFVsLAhIiKYTCa8/vrrcHJywpIlS+Dp6Yk5c+Zgz5495oH7VenhlMoPp3weOnQoABQ6PuWhbdu2QaFQSNa+adq0KWJiYgrd/8qVKwCAZs2aFZuLg4MDFixYgMjISPz8888AHkyt3KZNG+zdu7faLIqp0+kgiiK7oxFRrcXChoiI8NVXX+H06dP48MMPzYPPZ8+eDW9vb8yZMwdZWVkVfs34+PgiWzwOHjwIIG/sSffu3eHn54cffvgBe/bssdh//fr1+PPPPzF58mRJ17nBgwcjPj4ev/76q2R/g8GAzZs3w97evsjppvN75pln0KRJE6xcuRKiKAJ4sDhpamoqZs2aBZ1OZ3HM2bNnsXXr1n89d2mlpqbCaDRaxL/99lsAgI+PT4Vfk4jIFnCMDRFRLXDo0CFcu3bNIu7j44M6dergww8/RL9+/TB+/Hjza0qlEp9++imGDRuGZcuW4cMPP6zQnO7evQs/Pz/4+vqif//+8PDwQGpqKkJCQnDixAmMGDECHTp0MO//5ZdfYtSoUXjhhRcwevRoPP744zAYDDh48CD279+Pvn374oMPPpBcY/Lkyfj+++8xZcoUPPfcc/Dx8UFGRgZ2796Nc+fOYdmyZUWO2clPoVBg+vTpePfddxEaGoonn3wSo0aNwrx58xAYGIgLFy7g2WefRfPmzZGeno7w8HDs378f//nPfyTniYmJwfbt2y3OX69ePXOr1L8JCQlBUFAQhg8fjhYtWiA3NxfHjh1DSEgIOnXqVCELgBIR2SIhLS1NtHYSRERUObZu3YqZM2cW+fqSJUsQERGBQ4cO4fjx42jZsqXFPrNnz8a2bdvwxx9/SAoN4MEUy23atDF30SqNzMxMBAcH48CBA7h8+TISEhKgUqnwyCOP4Nlnn8XLL78MpVIpOSY7OxtffPEFdu7cievXr0MQBHh5eWHChAkICAiAQmH5eV16ejqCgoKwd+9exMbGQq1Wo3379pg+fbrFYp/Lli1DYGAgoqKi4O7ubpFvu3bt0LZtW/z+++/m+PHjx/Hll1/i5MmTSElJgaOjI3x8fDB+/Hj4+/tDJnvQOaK4Asrb2xsnTpwo0fft0qVLWLlyJU6ePImEhAQAQMuWLTF8+HC8+uqrxa4HRERUk7GwISIiIiIim8cxNkREREREZPM4xoaIiCpcfHx8sa+rVCo4OztXUTa2QafTITU1tdh9HBwcSr1ODxFRbcGuaEREVOH+bUC+r68v9u7dWzXJ2Ijw8HAMHz682H3mzZuHBQsWVFFGRES2hS02RERU4Xbt2lXs6yWZiay2ad++/b9+3zw9PaskFyIiW8QWGyIiIiIisnmcPICIiIiIiGweCxsiIiIiIrJ5LGyIiIiIiMjmsbAhIiIiIiKb9/97ZDpjvPhVdgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "kde_target('EXT_SOURCE_3', train)" ] }, { "cell_type": "markdown", "id": "4f57ac37", "metadata": {}, "source": [ "방금 만든 새로운 변수인데요, 다른 기관의 이전 대출 건수입니다.\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "8ba7d9cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The correlation between previous_loan_counts and the TARGET is -0.0100\n", "Median value for loan that was not repaid = 3.0000\n", "Median value for loan that was repaid = 4.0000\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "kde_target('previous_loan_counts', train)" ] }, { "cell_type": "markdown", "id": "d45be884", "metadata": {}, "source": [ "이것만으로 이 변수가 중요한지 알 수 없습니다. 상관 계수가 매우 약하고 분포에서 눈에 띄는 차이가 거의 없습니다.\n", "\n", "FBI 데이터 프레임에서 몇 가지 변수를 더 만들어 보겠습니다. 정보국 데이터 프레임에 있는 모든 숫자 열의 평균, 최소 및 최대값을 구하겠습니다.\n" ] }, { "cell_type": "markdown", "id": "9a7c08ba", "metadata": {}, "source": [ "### Aggregating Numeric Columns\n", "\n", "정보국 데이터 프레임의 숫자 정보를 설명하기 위해 모든 숫자 열에 대한 통계를 계산할 수 있습니다. 이를 위해 클라이언트 ID별로 그룹화하고 그룹화된 데이터 프레임을 집계한 후 결과를 교육 데이터에 다시 병합합니다. agg 함수는 연산이 유효한 것으로 간주되는 숫자 열의 값만 계산합니다. 우리는 'mean', 'max', 'min', 'sum'을 사용하겠지만 어떤 함수도 여기에 전달될 수 있습니다. 우리가 직접 함수를 작성하여 애그콜에 사용할 수도 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 69, "id": "8e77b071", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRDAYS_CREDITCREDIT_DAY_OVERDUE...DAYS_CREDIT_UPDATEAMT_ANNUITY
countmeanmaxminsumcountmeanmaxmin...countmeanmaxminsumcountmeanmaxminsum
01000017-735.000000-49-1572-514570.000...7-93.142857-6-155-65273545.35714310822.50.024817.5
11000028-874.000000-103-1437-699280.000...8-499.875000-7-1185-399970.0000000.00.00.0
21000034-1400.750000-606-2586-560340.000...4-816.000000-43-2131-32640NaNNaNNaN0.0
31000042-867.000000-408-1326-173420.000...2-532.000000-382-682-10640NaNNaNNaN0.0
41000053-190.666667-62-373-57230.000...3-54.333333-11-121-16331420.5000004261.50.04261.5
\n", "

5 rows × 61 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR DAYS_CREDIT CREDIT_DAY_OVERDUE \\\n", " count mean max min sum count \n", "0 100001 7 -735.000000 -49 -1572 -5145 7 \n", "1 100002 8 -874.000000 -103 -1437 -6992 8 \n", "2 100003 4 -1400.750000 -606 -2586 -5603 4 \n", "3 100004 2 -867.000000 -408 -1326 -1734 2 \n", "4 100005 3 -190.666667 -62 -373 -572 3 \n", "\n", " ... DAYS_CREDIT_UPDATE \\\n", " mean max min ... count mean max min sum \n", "0 0.0 0 0 ... 7 -93.142857 -6 -155 -652 \n", "1 0.0 0 0 ... 8 -499.875000 -7 -1185 -3999 \n", "2 0.0 0 0 ... 4 -816.000000 -43 -2131 -3264 \n", "3 0.0 0 0 ... 2 -532.000000 -382 -682 -1064 \n", "4 0.0 0 0 ... 3 -54.333333 -11 -121 -163 \n", "\n", " AMT_ANNUITY \n", " count mean max min sum \n", "0 7 3545.357143 10822.5 0.0 24817.5 \n", "1 7 0.000000 0.0 0.0 0.0 \n", "2 0 NaN NaN NaN 0.0 \n", "3 0 NaN NaN NaN 0.0 \n", "4 3 1420.500000 4261.5 0.0 4261.5 \n", "\n", "[5 rows x 61 columns]" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Group by the client id, calculate aggregation statistics\n", "bureau_agg = bureau.drop(columns = ['SK_ID_BUREAU']).groupby('SK_ID_CURR', as_index = False).agg(['count', 'mean', 'max', 'min', 'sum']).reset_index()\n", "bureau_agg.head()" ] }, { "cell_type": "code", "execution_count": 71, "id": "fddfbaee", "metadata": {}, "outputs": [], "source": [ "# List of column names\n", "columns = ['SK_ID_CURR']\n", "\n", "# Iterate through the variables names\n", "for var in bureau_agg.columns.levels[0]:\n", " # Skip the id name\n", " if var != 'SK_ID_CURR':\n", " \n", " # Iterate through the stat names\n", " for stat in bureau_agg.columns.levels[1][:-1]:\n", " # Make a new column name for the variable and stat\n", " columns.append('bureau_%s_%s' % (var, stat))" ] }, { "cell_type": "code", "execution_count": null, "id": "86eb2560", "metadata": {}, "outputs": [], "source": [ "train = pd.read_csv\n", "train_agg = train.group_by('id').agg(['min']).reset_index()\n", "\n", "columns = ['id']\n", "for i in train_agg.columns.levels[0]:\n", " for j in train_agg.columns.levels[1][:-1]:\n", " columns.append(\"brue_%s_%s\"% (i, j))\n", "train_agg.columns = columns\n", "train_agg" ] }, { "cell_type": "code", "execution_count": 11, "id": "de0c50aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FrozenList([['DAYS_CREDIT', 'CREDIT_DAY_OVERDUE', 'DAYS_CREDIT_ENDDATE', 'DAYS_ENDDATE_FACT', 'AMT_CREDIT_MAX_OVERDUE', 'CNT_CREDIT_PROLONG', 'AMT_CREDIT_SUM', 'AMT_CREDIT_SUM_DEBT', 'AMT_CREDIT_SUM_LIMIT', 'AMT_CREDIT_SUM_OVERDUE', 'DAYS_CREDIT_UPDATE', 'AMT_ANNUITY', 'SK_ID_CURR'], ['count', 'mean', 'max', 'min', 'sum', '']])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_agg.columns.levels" ] }, { "cell_type": "code", "execution_count": 12, "id": "7a05a77d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRbureau_DAYS_CREDIT_countbureau_DAYS_CREDIT_meanbureau_DAYS_CREDIT_maxbureau_DAYS_CREDIT_minbureau_DAYS_CREDIT_sumbureau_CREDIT_DAY_OVERDUE_countbureau_CREDIT_DAY_OVERDUE_meanbureau_CREDIT_DAY_OVERDUE_maxbureau_CREDIT_DAY_OVERDUE_min...bureau_DAYS_CREDIT_UPDATE_countbureau_DAYS_CREDIT_UPDATE_meanbureau_DAYS_CREDIT_UPDATE_maxbureau_DAYS_CREDIT_UPDATE_minbureau_DAYS_CREDIT_UPDATE_sumbureau_AMT_ANNUITY_countbureau_AMT_ANNUITY_meanbureau_AMT_ANNUITY_maxbureau_AMT_ANNUITY_minbureau_AMT_ANNUITY_sum
01000017-735.000000-49-1572-514570.000...7-93.142857-6-155-65273545.35714310822.50.024817.5
11000028-874.000000-103-1437-699280.000...8-499.875000-7-1185-399970.0000000.00.00.0
21000034-1400.750000-606-2586-560340.000...4-816.000000-43-2131-32640NaNNaNNaN0.0
31000042-867.000000-408-1326-173420.000...2-532.000000-382-682-10640NaNNaNNaN0.0
41000053-190.666667-62-373-57230.000...3-54.333333-11-121-16331420.5000004261.50.04261.5
\n", "

5 rows × 61 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR bureau_DAYS_CREDIT_count bureau_DAYS_CREDIT_mean \\\n", "0 100001 7 -735.000000 \n", "1 100002 8 -874.000000 \n", "2 100003 4 -1400.750000 \n", "3 100004 2 -867.000000 \n", "4 100005 3 -190.666667 \n", "\n", " bureau_DAYS_CREDIT_max bureau_DAYS_CREDIT_min bureau_DAYS_CREDIT_sum \\\n", "0 -49 -1572 -5145 \n", "1 -103 -1437 -6992 \n", "2 -606 -2586 -5603 \n", "3 -408 -1326 -1734 \n", "4 -62 -373 -572 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_count bureau_CREDIT_DAY_OVERDUE_mean \\\n", "0 7 0.0 \n", "1 8 0.0 \n", "2 4 0.0 \n", "3 2 0.0 \n", "4 3 0.0 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_max bureau_CREDIT_DAY_OVERDUE_min ... \\\n", "0 0 0 ... \n", "1 0 0 ... \n", "2 0 0 ... \n", "3 0 0 ... \n", "4 0 0 ... \n", "\n", " bureau_DAYS_CREDIT_UPDATE_count bureau_DAYS_CREDIT_UPDATE_mean \\\n", "0 7 -93.142857 \n", "1 8 -499.875000 \n", "2 4 -816.000000 \n", "3 2 -532.000000 \n", "4 3 -54.333333 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_max bureau_DAYS_CREDIT_UPDATE_min \\\n", "0 -6 -155 \n", "1 -7 -1185 \n", "2 -43 -2131 \n", "3 -382 -682 \n", "4 -11 -121 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_sum bureau_AMT_ANNUITY_count \\\n", "0 -652 7 \n", "1 -3999 7 \n", "2 -3264 0 \n", "3 -1064 0 \n", "4 -163 3 \n", "\n", " bureau_AMT_ANNUITY_mean bureau_AMT_ANNUITY_max bureau_AMT_ANNUITY_min \\\n", "0 3545.357143 10822.5 0.0 \n", "1 0.000000 0.0 0.0 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 1420.500000 4261.5 0.0 \n", "\n", " bureau_AMT_ANNUITY_sum \n", "0 24817.5 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 4261.5 \n", "\n", "[5 rows x 61 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Assign the list of columns names as the dataframe column names\n", "bureau_agg.columns = columns\n", "bureau_agg.head()" ] }, { "cell_type": "markdown", "id": "8e284469", "metadata": {}, "source": [ "이제 이전과 같이 교육 데이터와 병합하면 됩니다.\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "02b0eabc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRTARGETNAME_CONTRACT_TYPECODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALAMT_CREDITAMT_ANNUITY...bureau_DAYS_CREDIT_UPDATE_countbureau_DAYS_CREDIT_UPDATE_meanbureau_DAYS_CREDIT_UPDATE_maxbureau_DAYS_CREDIT_UPDATE_minbureau_DAYS_CREDIT_UPDATE_sumbureau_AMT_ANNUITY_countbureau_AMT_ANNUITY_meanbureau_AMT_ANNUITY_maxbureau_AMT_ANNUITY_minbureau_AMT_ANNUITY_sum
01000021Cash loansMNY0202500.0406597.524700.5...8.0-499.875-7.0-1185.0-3999.07.00.00.00.00.0
11000030Cash loansFNN0270000.01293502.535698.5...4.0-816.000-43.0-2131.0-3264.00.0NaNNaNNaN0.0
21000040Revolving loansMYY067500.0135000.06750.0...2.0-532.000-382.0-682.0-1064.00.0NaNNaNNaN0.0
31000060Cash loansFNY0135000.0312682.529686.5...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41000070Cash loansMNY0121500.0513000.021865.5...1.0-783.000-783.0-783.0-783.00.0NaNNaNNaN0.0
\n", "

5 rows × 183 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n", "0 100002 1 Cash loans M N \n", "1 100003 0 Cash loans F N \n", "2 100004 0 Revolving loans M Y \n", "3 100006 0 Cash loans F N \n", "4 100007 0 Cash loans M N \n", "\n", " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n", "0 Y 0 202500.0 406597.5 24700.5 \n", "1 N 0 270000.0 1293502.5 35698.5 \n", "2 Y 0 67500.0 135000.0 6750.0 \n", "3 Y 0 135000.0 312682.5 29686.5 \n", "4 Y 0 121500.0 513000.0 21865.5 \n", "\n", " ... bureau_DAYS_CREDIT_UPDATE_count bureau_DAYS_CREDIT_UPDATE_mean \\\n", "0 ... 8.0 -499.875 \n", "1 ... 4.0 -816.000 \n", "2 ... 2.0 -532.000 \n", "3 ... NaN NaN \n", "4 ... 1.0 -783.000 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_max bureau_DAYS_CREDIT_UPDATE_min \\\n", "0 -7.0 -1185.0 \n", "1 -43.0 -2131.0 \n", "2 -382.0 -682.0 \n", "3 NaN NaN \n", "4 -783.0 -783.0 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_sum bureau_AMT_ANNUITY_count \\\n", "0 -3999.0 7.0 \n", "1 -3264.0 0.0 \n", "2 -1064.0 0.0 \n", "3 NaN NaN \n", "4 -783.0 0.0 \n", "\n", " bureau_AMT_ANNUITY_mean bureau_AMT_ANNUITY_max bureau_AMT_ANNUITY_min \\\n", "0 0.0 0.0 0.0 \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " bureau_AMT_ANNUITY_sum \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "\n", "[5 rows x 183 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merge with the training data\n", "train = train.merge(bureau_agg, on = 'SK_ID_CURR', how = 'left')\n", "train.head()" ] }, { "cell_type": "markdown", "id": "484fa2af", "metadata": {}, "source": [ "### Correlations of Aggregated Values with Target\n", "\n", "우리는 모든 새로운 값과 목표값의 상관관계를 계산할 수 있습니다. 다시 말해, 모델링에 중요한 변수의 근사치로 사용할 수 있습니다.\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "5ab2663d", "metadata": {}, "outputs": [], "source": [ "# List of new correlations\n", "new_corrs = []\n", "\n", "for col in columns:\n", " corr = train['TARGET'].corr(train[col])\n", " new_corrs.append((col, corr))" ] }, { "cell_type": "markdown", "id": "7425a254", "metadata": {}, "source": [ "아래 코드에서는 정렬된 파이썬 함수를 사용하여 크기(절대값)별로 상관관계를 정렬합니다. 또한 또 다른 중요한 파이썬 연산인 익명 람다 함수를 사용합니다.\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "166b1e0b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('bureau_DAYS_CREDIT_mean', 0.08972896721998116),\n", " ('bureau_DAYS_CREDIT_min', 0.0752482510301037),\n", " ('bureau_DAYS_CREDIT_UPDATE_mean', 0.06892735266968673),\n", " ('bureau_DAYS_ENDDATE_FACT_min', 0.05588737984392085),\n", " ('bureau_DAYS_CREDIT_ENDDATE_sum', 0.053734895601020544),\n", " ('bureau_DAYS_ENDDATE_FACT_mean', 0.053199625857586336),\n", " ('bureau_DAYS_CREDIT_max', 0.049782054639973074),\n", " ('bureau_DAYS_ENDDATE_FACT_sum', 0.04885350261111597),\n", " ('bureau_DAYS_CREDIT_ENDDATE_mean', 0.046982754334835404),\n", " ('bureau_DAYS_CREDIT_UPDATE_min', 0.042863922470730204),\n", " ('bureau_DAYS_CREDIT_sum', 0.041999824814846765),\n", " ('bureau_DAYS_CREDIT_UPDATE_sum', 0.04140363535306015),\n", " ('bureau_DAYS_CREDIT_ENDDATE_max', 0.03658963469632896),\n", " ('bureau_DAYS_CREDIT_ENDDATE_min', 0.034281109921615996),\n", " ('bureau_DAYS_ENDDATE_FACT_count', -0.03049230665332553)]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sort the correlations by the absolute value\n", "# Make sure to reverse to put the largest values at the front of list\n", "new_corrs = sorted(new_corrs, key = lambda x: abs(x[1]), reverse = True)\n", "new_corrs[:15]" ] }, { "cell_type": "markdown", "id": "5c0d7a70", "metadata": {}, "source": [ "목표값과 유의한 상관 관계가 있는 새 변수가 없습니다. 상관 관계가 가장 높은 변수 bro_의 KDE 그림을 볼 수 있습니다.절대 크기 상관 관계 측면에서 목표값이 있는 DAYS_Credit_mean입니다.\n" ] }, { "cell_type": "code", "execution_count": 16, "id": "3d3ff340", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The correlation between bureau_DAYS_CREDIT_mean and the TARGET is 0.0897\n", "Median value for loan that was not repaid = -835.3333\n", "Median value for loan that was repaid = -1067.0000\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "kde_target('bureau_DAYS_CREDIT_mean', train)" ] }, { "cell_type": "markdown", "id": "136dea9b", "metadata": {}, "source": [ "이 칼럼의 정의는 \"고객이 현재 신청하기 며칠 전에 신용 조회를 신청했는가\"입니다.\n", "\n", "제가 해석한 바로는 이전 대출이 홈 크레딧에 대출 신청 전 신청된 일수입니다.\n", "\n", "따라서 마이너스 숫자가 크면 대출이 현재 대출 신청 전에 더 많았음을 나타냅니다.\n", "\n", "우리는 이 변수의 평균과 과거에 대출을 더 신청한 고객들이 잠재적으로 홈 크레딧에서 대출을 상환할 가능성이 더 높다는 목표 의미 사이의 극히 약한 긍정적 관계를 봅니다. 하지만 이렇게 약한 상관관계에서는 신호와 마찬가지로 노이즈일 가능성이 높습니다.\n", "\n", "### The Multiple Comparisons Problem\n", "\n", "변수가 많을 때, 우리는 변수 중 일부는 순수 우연에 의해 상관관계가 있을 것으로 예상하는데, 이는 다중 비교라고 알려져 있습니다. 우리는 수백 개의 특징을 만들 수 있고, 일부는 단순히 데이터의 무작위 노이즈 때문에 대상과 공동화될 것입니다. 그런 다음 모델이 훈련할 때 훈련 세트의 목표값과 관계가 있다고 생각하기 때문에 이러한 변수에 과도하게 적합될 수 있지만 이것이 반드시 검정 집합으로 일반화되는 것은 아닙니다. 우리가 특징을 만들 때 고려해야 할 많은 사항들이 있습니다!\n" ] }, { "cell_type": "markdown", "id": "e04bafd1", "metadata": {}, "source": [ "# Function for Numeric Aggregations\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "d3fe33f0", "metadata": {}, "outputs": [], "source": [ "def agg_numeric(df, group_var, df_name):\n", " \"\"\"Aggregates the numeric values in a dataframe. This can\n", " be used to create features for each instance of the grouping variable.\n", " \n", " Parameters\n", " --------\n", " df (dataframe): \n", " the dataframe to calculate the statistics on\n", " group_var (string): \n", " the variable by which to group df\n", " df_name (string): \n", " the variable used to rename the columns\n", " \n", " Return\n", " --------\n", " agg (dataframe): \n", " a dataframe with the statistics aggregated for \n", " all numeric columns. Each instance of the grouping variable will have \n", " the statistics (mean, min, max, sum; currently supported) calculated. \n", " The columns are also renamed to keep track of features created.\n", " \n", " \"\"\"\n", " for col in df:\n", " if col != group_var and 'SK_ID' in col:\n", " df = df.drop(columns = col)\n", " \n", " group_ids = df[group_var] # 'SK_ID_CURR'\n", " numeric_df = df.select_dtypes('number')\n", " numeric_df[group_var] = group_ids\n", "\n", " agg = numeric_df.groupby(group_var).agg(['count', 'mean', 'max', 'min', 'sum']).reset_index()\n", "\n", " columns = [group_var]\n", "\n", " for var in agg.columns.levels[0]:\n", " if var != group_var:\n", " for stat in agg.columns.levels[1][:-1]:\n", " columns.append('%s_%s_%s' % (df_name, var, stat))\n", "\n", " agg.columns = columns\n", " return agg" ] }, { "cell_type": "code", "execution_count": 18, "id": "3fd3470e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRbureau_DAYS_CREDIT_countbureau_DAYS_CREDIT_meanbureau_DAYS_CREDIT_maxbureau_DAYS_CREDIT_minbureau_DAYS_CREDIT_sumbureau_CREDIT_DAY_OVERDUE_countbureau_CREDIT_DAY_OVERDUE_meanbureau_CREDIT_DAY_OVERDUE_maxbureau_CREDIT_DAY_OVERDUE_min...bureau_DAYS_CREDIT_UPDATE_countbureau_DAYS_CREDIT_UPDATE_meanbureau_DAYS_CREDIT_UPDATE_maxbureau_DAYS_CREDIT_UPDATE_minbureau_DAYS_CREDIT_UPDATE_sumbureau_AMT_ANNUITY_countbureau_AMT_ANNUITY_meanbureau_AMT_ANNUITY_maxbureau_AMT_ANNUITY_minbureau_AMT_ANNUITY_sum
01000017-735.000000-49-1572-514570.000...7-93.142857-6-155-65273545.35714310822.50.024817.5
11000028-874.000000-103-1437-699280.000...8-499.875000-7-1185-399970.0000000.00.00.0
21000034-1400.750000-606-2586-560340.000...4-816.000000-43-2131-32640NaNNaNNaN0.0
31000042-867.000000-408-1326-173420.000...2-532.000000-382-682-10640NaNNaNNaN0.0
41000053-190.666667-62-373-57230.000...3-54.333333-11-121-16331420.5000004261.50.04261.5
\n", "

5 rows × 61 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR bureau_DAYS_CREDIT_count bureau_DAYS_CREDIT_mean \\\n", "0 100001 7 -735.000000 \n", "1 100002 8 -874.000000 \n", "2 100003 4 -1400.750000 \n", "3 100004 2 -867.000000 \n", "4 100005 3 -190.666667 \n", "\n", " bureau_DAYS_CREDIT_max bureau_DAYS_CREDIT_min bureau_DAYS_CREDIT_sum \\\n", "0 -49 -1572 -5145 \n", "1 -103 -1437 -6992 \n", "2 -606 -2586 -5603 \n", "3 -408 -1326 -1734 \n", "4 -62 -373 -572 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_count bureau_CREDIT_DAY_OVERDUE_mean \\\n", "0 7 0.0 \n", "1 8 0.0 \n", "2 4 0.0 \n", "3 2 0.0 \n", "4 3 0.0 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_max bureau_CREDIT_DAY_OVERDUE_min ... \\\n", "0 0 0 ... \n", "1 0 0 ... \n", "2 0 0 ... \n", "3 0 0 ... \n", "4 0 0 ... \n", "\n", " bureau_DAYS_CREDIT_UPDATE_count bureau_DAYS_CREDIT_UPDATE_mean \\\n", "0 7 -93.142857 \n", "1 8 -499.875000 \n", "2 4 -816.000000 \n", "3 2 -532.000000 \n", "4 3 -54.333333 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_max bureau_DAYS_CREDIT_UPDATE_min \\\n", "0 -6 -155 \n", "1 -7 -1185 \n", "2 -43 -2131 \n", "3 -382 -682 \n", "4 -11 -121 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_sum bureau_AMT_ANNUITY_count \\\n", "0 -652 7 \n", "1 -3999 7 \n", "2 -3264 0 \n", "3 -1064 0 \n", "4 -163 3 \n", "\n", " bureau_AMT_ANNUITY_mean bureau_AMT_ANNUITY_max bureau_AMT_ANNUITY_min \\\n", "0 3545.357143 10822.5 0.0 \n", "1 0.000000 0.0 0.0 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 1420.500000 4261.5 0.0 \n", "\n", " bureau_AMT_ANNUITY_sum \n", "0 24817.5 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 4261.5 \n", "\n", "[5 rows x 61 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_agg_new = agg_numeric(bureau.drop(columns = ['SK_ID_BUREAU']), group_var = 'SK_ID_CURR', df_name = 'bureau')\n", "bureau_agg_new.head()" ] }, { "cell_type": "markdown", "id": "77683a96", "metadata": {}, "source": [ "이 기능이 의도한 대로 작동하는지 확인하기 위해서는 우리가 직접 만든 집계된 데이터 프레임과 비교해야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 19, "id": "0f43643d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRbureau_DAYS_CREDIT_countbureau_DAYS_CREDIT_meanbureau_DAYS_CREDIT_maxbureau_DAYS_CREDIT_minbureau_DAYS_CREDIT_sumbureau_CREDIT_DAY_OVERDUE_countbureau_CREDIT_DAY_OVERDUE_meanbureau_CREDIT_DAY_OVERDUE_maxbureau_CREDIT_DAY_OVERDUE_min...bureau_DAYS_CREDIT_UPDATE_countbureau_DAYS_CREDIT_UPDATE_meanbureau_DAYS_CREDIT_UPDATE_maxbureau_DAYS_CREDIT_UPDATE_minbureau_DAYS_CREDIT_UPDATE_sumbureau_AMT_ANNUITY_countbureau_AMT_ANNUITY_meanbureau_AMT_ANNUITY_maxbureau_AMT_ANNUITY_minbureau_AMT_ANNUITY_sum
01000017-735.000000-49-1572-514570.000...7-93.142857-6-155-65273545.35714310822.50.024817.5
11000028-874.000000-103-1437-699280.000...8-499.875000-7-1185-399970.0000000.00.00.0
21000034-1400.750000-606-2586-560340.000...4-816.000000-43-2131-32640NaNNaNNaN0.0
31000042-867.000000-408-1326-173420.000...2-532.000000-382-682-10640NaNNaNNaN0.0
41000053-190.666667-62-373-57230.000...3-54.333333-11-121-16331420.5000004261.50.04261.5
\n", "

5 rows × 61 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR bureau_DAYS_CREDIT_count bureau_DAYS_CREDIT_mean \\\n", "0 100001 7 -735.000000 \n", "1 100002 8 -874.000000 \n", "2 100003 4 -1400.750000 \n", "3 100004 2 -867.000000 \n", "4 100005 3 -190.666667 \n", "\n", " bureau_DAYS_CREDIT_max bureau_DAYS_CREDIT_min bureau_DAYS_CREDIT_sum \\\n", "0 -49 -1572 -5145 \n", "1 -103 -1437 -6992 \n", "2 -606 -2586 -5603 \n", "3 -408 -1326 -1734 \n", "4 -62 -373 -572 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_count bureau_CREDIT_DAY_OVERDUE_mean \\\n", "0 7 0.0 \n", "1 8 0.0 \n", "2 4 0.0 \n", "3 2 0.0 \n", "4 3 0.0 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_max bureau_CREDIT_DAY_OVERDUE_min ... \\\n", "0 0 0 ... \n", "1 0 0 ... \n", "2 0 0 ... \n", "3 0 0 ... \n", "4 0 0 ... \n", "\n", " bureau_DAYS_CREDIT_UPDATE_count bureau_DAYS_CREDIT_UPDATE_mean \\\n", "0 7 -93.142857 \n", "1 8 -499.875000 \n", "2 4 -816.000000 \n", "3 2 -532.000000 \n", "4 3 -54.333333 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_max bureau_DAYS_CREDIT_UPDATE_min \\\n", "0 -6 -155 \n", "1 -7 -1185 \n", "2 -43 -2131 \n", "3 -382 -682 \n", "4 -11 -121 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_sum bureau_AMT_ANNUITY_count \\\n", "0 -652 7 \n", "1 -3999 7 \n", "2 -3264 0 \n", "3 -1064 0 \n", "4 -163 3 \n", "\n", " bureau_AMT_ANNUITY_mean bureau_AMT_ANNUITY_max bureau_AMT_ANNUITY_min \\\n", "0 3545.357143 10822.5 0.0 \n", "1 0.000000 0.0 0.0 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 1420.500000 4261.5 0.0 \n", "\n", " bureau_AMT_ANNUITY_sum \n", "0 24817.5 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 4261.5 \n", "\n", "[5 rows x 61 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_agg.head()" ] }, { "cell_type": "markdown", "id": "b54cf932", "metadata": {}, "source": [ "만약 우리가 그 값들을 조사한다면, 우리는 그것들이 동등하다는 것을 발견합니다. 우리는 이 함수를 다른 데이터 프레임의 숫자 통계를 계산하는 데 사용할 수 있을 것입니다. 기능을 사용하면 일관된 결과를 얻을 수 있고 미래에 해야 할 일의 양을 줄일 수 있습니다!\n" ] }, { "cell_type": "markdown", "id": "504233ad", "metadata": {}, "source": [ "### Correlation Function\n", "다음으로 넘어가기 전에 대상과의 상관관계를 함수로 계산하기 위한 코드를 만들 수도 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "e5d5b569", "metadata": {}, "outputs": [], "source": [ "def target_corrs(df):\n", "\n", " corrs = []\n", "\n", " for col in df.columns:\n", " print(col)\n", " if col != 'TARGET':\n", " corr = df['TARGET'].corr(df[col])\n", "\n", " corrs.append((col, corr))\n", " \n", " corrs = sorted(corrs, key = lambda x: abs(x[1]), reverse = True)\n", " \n", " return corrs" ] }, { "cell_type": "markdown", "id": "50f46cbd", "metadata": {}, "source": [ "# Categorical Variables\n", "\n", "이제 숫자 열에서 범주형 열로 이동합니다. 이것들은 이산 문자열 변수이기 때문에 숫자 변수에만 적용되는 평균, 최대와 같은 통계만 계산할 수는 없습니다. 대신, 우리는 각 범주형 변수 내에서 각 범주의 값 카운트를 계산하는 것에 의존할 것입니다. 예를 들어, 다음과 같은 데이터 프레임이 있습니다.\n" ] }, { "cell_type": "markdown", "id": "3c43df20", "metadata": {}, "source": [ "먼저 범주형 열(dtype == 'object')만 사용하여 데이터 프레임을 원핫 인코딩합니다.\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "e4b1947c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CREDIT_ACTIVE_ActiveCREDIT_ACTIVE_Bad debtCREDIT_ACTIVE_ClosedCREDIT_ACTIVE_SoldCREDIT_CURRENCY_currency 1CREDIT_CURRENCY_currency 2CREDIT_CURRENCY_currency 3CREDIT_CURRENCY_currency 4CREDIT_TYPE_Another type of loanCREDIT_TYPE_Car loan...CREDIT_TYPE_Loan for business developmentCREDIT_TYPE_Loan for purchase of shares (margin lending)CREDIT_TYPE_Loan for the purchase of equipmentCREDIT_TYPE_Loan for working capital replenishmentCREDIT_TYPE_MicroloanCREDIT_TYPE_Mobile operator loanCREDIT_TYPE_MortgageCREDIT_TYPE_Real estate loanCREDIT_TYPE_Unknown type of loanSK_ID_CURR
00010100000...000000000215354
11000100000...000000000215354
21000100000...000000000215354
31000100000...000000000215354
41000100000...000000000215354
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " CREDIT_ACTIVE_Active CREDIT_ACTIVE_Bad debt CREDIT_ACTIVE_Closed \\\n", "0 0 0 1 \n", "1 1 0 0 \n", "2 1 0 0 \n", "3 1 0 0 \n", "4 1 0 0 \n", "\n", " CREDIT_ACTIVE_Sold CREDIT_CURRENCY_currency 1 CREDIT_CURRENCY_currency 2 \\\n", "0 0 1 0 \n", "1 0 1 0 \n", "2 0 1 0 \n", "3 0 1 0 \n", "4 0 1 0 \n", "\n", " CREDIT_CURRENCY_currency 3 CREDIT_CURRENCY_currency 4 \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " CREDIT_TYPE_Another type of loan CREDIT_TYPE_Car loan ... \\\n", "0 0 0 ... \n", "1 0 0 ... \n", "2 0 0 ... \n", "3 0 0 ... \n", "4 0 0 ... \n", "\n", " CREDIT_TYPE_Loan for business development \\\n", "0 0 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "\n", " CREDIT_TYPE_Loan for purchase of shares (margin lending) \\\n", "0 0 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "\n", " CREDIT_TYPE_Loan for the purchase of equipment \\\n", "0 0 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "\n", " CREDIT_TYPE_Loan for working capital replenishment CREDIT_TYPE_Microloan \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " CREDIT_TYPE_Mobile operator loan CREDIT_TYPE_Mortgage \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " CREDIT_TYPE_Real estate loan CREDIT_TYPE_Unknown type of loan SK_ID_CURR \n", "0 0 0 215354 \n", "1 0 0 215354 \n", "2 0 0 215354 \n", "3 0 0 215354 \n", "4 0 0 215354 \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categorical = pd.get_dummies(bureau.select_dtypes('object'))\n", "categorical['SK_ID_CURR'] = bureau['SK_ID_CURR']\n", "categorical.head()" ] }, { "cell_type": "code", "execution_count": 22, "id": "88695246", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CREDIT_ACTIVE_ActiveCREDIT_ACTIVE_Bad debtCREDIT_ACTIVE_ClosedCREDIT_ACTIVE_SoldCREDIT_CURRENCY_currency 1...CREDIT_TYPE_MicroloanCREDIT_TYPE_Mobile operator loanCREDIT_TYPE_MortgageCREDIT_TYPE_Real estate loanCREDIT_TYPE_Unknown type of loan
summeansummeansummeansummeansummean...summeansummeansummeansummeansummean
SK_ID_CURR
10000130.42857100.040.57142900.071.0...00.000.000.000.000.0
10000220.25000000.060.75000000.081.0...00.000.000.000.000.0
10000310.25000000.030.75000000.041.0...00.000.000.000.000.0
10000400.00000000.021.00000000.021.0...00.000.000.000.000.0
10000520.66666700.010.33333300.031.0...00.000.000.000.000.0
\n", "

5 rows × 46 columns

\n", "
" ], "text/plain": [ " CREDIT_ACTIVE_Active CREDIT_ACTIVE_Bad debt \\\n", " sum mean sum mean \n", "SK_ID_CURR \n", "100001 3 0.428571 0 0.0 \n", "100002 2 0.250000 0 0.0 \n", "100003 1 0.250000 0 0.0 \n", "100004 0 0.000000 0 0.0 \n", "100005 2 0.666667 0 0.0 \n", "\n", " CREDIT_ACTIVE_Closed CREDIT_ACTIVE_Sold \\\n", " sum mean sum mean \n", "SK_ID_CURR \n", "100001 4 0.571429 0 0.0 \n", "100002 6 0.750000 0 0.0 \n", "100003 3 0.750000 0 0.0 \n", "100004 2 1.000000 0 0.0 \n", "100005 1 0.333333 0 0.0 \n", "\n", " CREDIT_CURRENCY_currency 1 ... CREDIT_TYPE_Microloan \\\n", " sum mean ... sum mean \n", "SK_ID_CURR ... \n", "100001 7 1.0 ... 0 0.0 \n", "100002 8 1.0 ... 0 0.0 \n", "100003 4 1.0 ... 0 0.0 \n", "100004 2 1.0 ... 0 0.0 \n", "100005 3 1.0 ... 0 0.0 \n", "\n", " CREDIT_TYPE_Mobile operator loan CREDIT_TYPE_Mortgage \\\n", " sum mean sum mean \n", "SK_ID_CURR \n", "100001 0 0.0 0 0.0 \n", "100002 0 0.0 0 0.0 \n", "100003 0 0.0 0 0.0 \n", "100004 0 0.0 0 0.0 \n", "100005 0 0.0 0 0.0 \n", "\n", " CREDIT_TYPE_Real estate loan CREDIT_TYPE_Unknown type of loan \\\n", " sum mean sum \n", "SK_ID_CURR \n", "100001 0 0.0 0 \n", "100002 0 0.0 0 \n", "100003 0 0.0 0 \n", "100004 0 0.0 0 \n", "100005 0 0.0 0 \n", "\n", " \n", " mean \n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", "[5 rows x 46 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categorical_grouped = categorical.groupby('SK_ID_CURR').agg(['sum', 'mean'])\n", "categorical_grouped.head()" ] }, { "cell_type": "code", "execution_count": 23, "id": "d5e1e1b7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['CREDIT_ACTIVE_Active', 'CREDIT_ACTIVE_Bad debt',\n", " 'CREDIT_ACTIVE_Closed', 'CREDIT_ACTIVE_Sold',\n", " 'CREDIT_CURRENCY_currency 1', 'CREDIT_CURRENCY_currency 2',\n", " 'CREDIT_CURRENCY_currency 3', 'CREDIT_CURRENCY_currency 4',\n", " 'CREDIT_TYPE_Another type of loan', 'CREDIT_TYPE_Car loan'],\n", " dtype='object')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categorical_grouped.columns.levels[0][:10]" ] }, { "cell_type": "code", "execution_count": 24, "id": "9fc53702", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['sum', 'mean'], dtype='object')" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categorical_grouped.columns.levels[1]" ] }, { "cell_type": "code", "execution_count": 25, "id": "a7b764e3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CREDIT_ACTIVE_Active_countCREDIT_ACTIVE_Active_count_normCREDIT_ACTIVE_Bad debt_countCREDIT_ACTIVE_Bad debt_count_normCREDIT_ACTIVE_Closed_countCREDIT_ACTIVE_Closed_count_normCREDIT_ACTIVE_Sold_countCREDIT_ACTIVE_Sold_count_normCREDIT_CURRENCY_currency 1_countCREDIT_CURRENCY_currency 1_count_norm...CREDIT_TYPE_Microloan_countCREDIT_TYPE_Microloan_count_normCREDIT_TYPE_Mobile operator loan_countCREDIT_TYPE_Mobile operator loan_count_normCREDIT_TYPE_Mortgage_countCREDIT_TYPE_Mortgage_count_normCREDIT_TYPE_Real estate loan_countCREDIT_TYPE_Real estate loan_count_normCREDIT_TYPE_Unknown type of loan_countCREDIT_TYPE_Unknown type of loan_count_norm
SK_ID_CURR
10000130.42857100.040.57142900.071.0...00.000.000.000.000.0
10000220.25000000.060.75000000.081.0...00.000.000.000.000.0
10000310.25000000.030.75000000.041.0...00.000.000.000.000.0
10000400.00000000.021.00000000.021.0...00.000.000.000.000.0
10000520.66666700.010.33333300.031.0...00.000.000.000.000.0
\n", "

5 rows × 46 columns

\n", "
" ], "text/plain": [ " CREDIT_ACTIVE_Active_count CREDIT_ACTIVE_Active_count_norm \\\n", "SK_ID_CURR \n", "100001 3 0.428571 \n", "100002 2 0.250000 \n", "100003 1 0.250000 \n", "100004 0 0.000000 \n", "100005 2 0.666667 \n", "\n", " CREDIT_ACTIVE_Bad debt_count CREDIT_ACTIVE_Bad debt_count_norm \\\n", "SK_ID_CURR \n", "100001 0 0.0 \n", "100002 0 0.0 \n", "100003 0 0.0 \n", "100004 0 0.0 \n", "100005 0 0.0 \n", "\n", " CREDIT_ACTIVE_Closed_count CREDIT_ACTIVE_Closed_count_norm \\\n", "SK_ID_CURR \n", "100001 4 0.571429 \n", "100002 6 0.750000 \n", "100003 3 0.750000 \n", "100004 2 1.000000 \n", "100005 1 0.333333 \n", "\n", " CREDIT_ACTIVE_Sold_count CREDIT_ACTIVE_Sold_count_norm \\\n", "SK_ID_CURR \n", "100001 0 0.0 \n", "100002 0 0.0 \n", "100003 0 0.0 \n", "100004 0 0.0 \n", "100005 0 0.0 \n", "\n", " CREDIT_CURRENCY_currency 1_count \\\n", "SK_ID_CURR \n", "100001 7 \n", "100002 8 \n", "100003 4 \n", "100004 2 \n", "100005 3 \n", "\n", " CREDIT_CURRENCY_currency 1_count_norm ... \\\n", "SK_ID_CURR ... \n", "100001 1.0 ... \n", "100002 1.0 ... \n", "100003 1.0 ... \n", "100004 1.0 ... \n", "100005 1.0 ... \n", "\n", " CREDIT_TYPE_Microloan_count CREDIT_TYPE_Microloan_count_norm \\\n", "SK_ID_CURR \n", "100001 0 0.0 \n", "100002 0 0.0 \n", "100003 0 0.0 \n", "100004 0 0.0 \n", "100005 0 0.0 \n", "\n", " CREDIT_TYPE_Mobile operator loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " CREDIT_TYPE_Mobile operator loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " CREDIT_TYPE_Mortgage_count CREDIT_TYPE_Mortgage_count_norm \\\n", "SK_ID_CURR \n", "100001 0 0.0 \n", "100002 0 0.0 \n", "100003 0 0.0 \n", "100004 0 0.0 \n", "100005 0 0.0 \n", "\n", " CREDIT_TYPE_Real estate loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " CREDIT_TYPE_Real estate loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " CREDIT_TYPE_Unknown type of loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " CREDIT_TYPE_Unknown type of loan_count_norm \n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", "[5 rows x 46 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group_var = 'SK_ID_CURR'\n", "\n", "# Need to create new column names\n", "columns = []\n", "\n", "# Iterate through the variables names\n", "for var in categorical_grouped.columns.levels[0]:\n", " # Skip the grouping variable\n", " if var != group_var:\n", " # Iterate through the stat names\n", " for stat in ['count', 'count_norm']:\n", " # Make a new column name for the variable and stat\n", " columns.append('%s_%s' % (var, stat))\n", "\n", "# Rename the columns\n", "categorical_grouped.columns = columns\n", "\n", "categorical_grouped.head()" ] }, { "cell_type": "code", "execution_count": 26, "id": "080b5157", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRTARGETNAME_CONTRACT_TYPECODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALAMT_CREDITAMT_ANNUITY...CREDIT_TYPE_Microloan_countCREDIT_TYPE_Microloan_count_normCREDIT_TYPE_Mobile operator loan_countCREDIT_TYPE_Mobile operator loan_count_normCREDIT_TYPE_Mortgage_countCREDIT_TYPE_Mortgage_count_normCREDIT_TYPE_Real estate loan_countCREDIT_TYPE_Real estate loan_count_normCREDIT_TYPE_Unknown type of loan_countCREDIT_TYPE_Unknown type of loan_count_norm
01000021Cash loansMNY0202500.0406597.524700.5...0.00.00.00.00.00.00.00.00.00.0
11000030Cash loansFNN0270000.01293502.535698.5...0.00.00.00.00.00.00.00.00.00.0
21000040Revolving loansMYY067500.0135000.06750.0...0.00.00.00.00.00.00.00.00.00.0
31000060Cash loansFNY0135000.0312682.529686.5...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41000070Cash loansMNY0121500.0513000.021865.5...0.00.00.00.00.00.00.00.00.00.0
\n", "

5 rows × 229 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n", "0 100002 1 Cash loans M N \n", "1 100003 0 Cash loans F N \n", "2 100004 0 Revolving loans M Y \n", "3 100006 0 Cash loans F N \n", "4 100007 0 Cash loans M N \n", "\n", " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n", "0 Y 0 202500.0 406597.5 24700.5 \n", "1 N 0 270000.0 1293502.5 35698.5 \n", "2 Y 0 67500.0 135000.0 6750.0 \n", "3 Y 0 135000.0 312682.5 29686.5 \n", "4 Y 0 121500.0 513000.0 21865.5 \n", "\n", " ... CREDIT_TYPE_Microloan_count CREDIT_TYPE_Microloan_count_norm \\\n", "0 ... 0.0 0.0 \n", "1 ... 0.0 0.0 \n", "2 ... 0.0 0.0 \n", "3 ... NaN NaN \n", "4 ... 0.0 0.0 \n", "\n", " CREDIT_TYPE_Mobile operator loan_count \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "\n", " CREDIT_TYPE_Mobile operator loan_count_norm CREDIT_TYPE_Mortgage_count \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "\n", " CREDIT_TYPE_Mortgage_count_norm CREDIT_TYPE_Real estate loan_count \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "\n", " CREDIT_TYPE_Real estate loan_count_norm \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "\n", " CREDIT_TYPE_Unknown type of loan_count \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "\n", " CREDIT_TYPE_Unknown type of loan_count_norm \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "\n", "[5 rows x 229 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train = train.merge(categorical_grouped, left_on = 'SK_ID_CURR', right_index = True, how = 'left')\n", "train.head()" ] }, { "cell_type": "code", "execution_count": 27, "id": "bc26a4a4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(307511, 229)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train.shape" ] }, { "cell_type": "code", "execution_count": 28, "id": "451a2222", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bureau_DAYS_CREDIT_countbureau_DAYS_CREDIT_meanbureau_DAYS_CREDIT_maxbureau_DAYS_CREDIT_minbureau_DAYS_CREDIT_sumbureau_CREDIT_DAY_OVERDUE_countbureau_CREDIT_DAY_OVERDUE_meanbureau_CREDIT_DAY_OVERDUE_maxbureau_CREDIT_DAY_OVERDUE_minbureau_CREDIT_DAY_OVERDUE_sum...CREDIT_TYPE_Microloan_countCREDIT_TYPE_Microloan_count_normCREDIT_TYPE_Mobile operator loan_countCREDIT_TYPE_Mobile operator loan_count_normCREDIT_TYPE_Mortgage_countCREDIT_TYPE_Mortgage_count_normCREDIT_TYPE_Real estate loan_countCREDIT_TYPE_Real estate loan_count_normCREDIT_TYPE_Unknown type of loan_countCREDIT_TYPE_Unknown type of loan_count_norm
08.0-874.000000-103.0-1437.0-6992.08.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
14.0-1400.750000-606.0-2586.0-5603.04.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
22.0-867.000000-408.0-1326.0-1734.02.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
3NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41.0-1149.000000-1149.0-1149.0-1149.01.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
53.0-757.333333-78.0-1097.0-2272.03.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
618.0-1271.500000-239.0-2882.0-22887.018.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
72.0-1939.500000-1138.0-2741.0-3879.02.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
84.0-1773.000000-1309.0-2508.0-7092.04.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
9NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

10 rows × 106 columns

\n", "
" ], "text/plain": [ " bureau_DAYS_CREDIT_count bureau_DAYS_CREDIT_mean bureau_DAYS_CREDIT_max \\\n", "0 8.0 -874.000000 -103.0 \n", "1 4.0 -1400.750000 -606.0 \n", "2 2.0 -867.000000 -408.0 \n", "3 NaN NaN NaN \n", "4 1.0 -1149.000000 -1149.0 \n", "5 3.0 -757.333333 -78.0 \n", "6 18.0 -1271.500000 -239.0 \n", "7 2.0 -1939.500000 -1138.0 \n", "8 4.0 -1773.000000 -1309.0 \n", "9 NaN NaN NaN \n", "\n", " bureau_DAYS_CREDIT_min bureau_DAYS_CREDIT_sum \\\n", "0 -1437.0 -6992.0 \n", "1 -2586.0 -5603.0 \n", "2 -1326.0 -1734.0 \n", "3 NaN NaN \n", "4 -1149.0 -1149.0 \n", "5 -1097.0 -2272.0 \n", "6 -2882.0 -22887.0 \n", "7 -2741.0 -3879.0 \n", "8 -2508.0 -7092.0 \n", "9 NaN NaN \n", "\n", " bureau_CREDIT_DAY_OVERDUE_count bureau_CREDIT_DAY_OVERDUE_mean \\\n", "0 8.0 0.0 \n", "1 4.0 0.0 \n", "2 2.0 0.0 \n", "3 NaN NaN \n", "4 1.0 0.0 \n", "5 3.0 0.0 \n", "6 18.0 0.0 \n", "7 2.0 0.0 \n", "8 4.0 0.0 \n", "9 NaN NaN \n", "\n", " bureau_CREDIT_DAY_OVERDUE_max bureau_CREDIT_DAY_OVERDUE_min \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "5 0.0 0.0 \n", "6 0.0 0.0 \n", "7 0.0 0.0 \n", "8 0.0 0.0 \n", "9 NaN NaN \n", "\n", " bureau_CREDIT_DAY_OVERDUE_sum ... CREDIT_TYPE_Microloan_count \\\n", "0 0.0 ... 0.0 \n", "1 0.0 ... 0.0 \n", "2 0.0 ... 0.0 \n", "3 NaN ... NaN \n", "4 0.0 ... 0.0 \n", "5 0.0 ... 0.0 \n", "6 0.0 ... 0.0 \n", "7 0.0 ... 0.0 \n", "8 0.0 ... 0.0 \n", "9 NaN ... NaN \n", "\n", " CREDIT_TYPE_Microloan_count_norm CREDIT_TYPE_Mobile operator loan_count \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "5 0.0 0.0 \n", "6 0.0 0.0 \n", "7 0.0 0.0 \n", "8 0.0 0.0 \n", "9 NaN NaN \n", "\n", " CREDIT_TYPE_Mobile operator loan_count_norm CREDIT_TYPE_Mortgage_count \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "5 0.0 0.0 \n", "6 0.0 0.0 \n", "7 0.0 0.0 \n", "8 0.0 0.0 \n", "9 NaN NaN \n", "\n", " CREDIT_TYPE_Mortgage_count_norm CREDIT_TYPE_Real estate loan_count \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 NaN NaN \n", "4 0.0 0.0 \n", "5 0.0 0.0 \n", "6 0.0 0.0 \n", "7 0.0 0.0 \n", "8 0.0 0.0 \n", "9 NaN NaN \n", "\n", " CREDIT_TYPE_Real estate loan_count_norm \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "5 0.0 \n", "6 0.0 \n", "7 0.0 \n", "8 0.0 \n", "9 NaN \n", "\n", " CREDIT_TYPE_Unknown type of loan_count \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "5 0.0 \n", "6 0.0 \n", "7 0.0 \n", "8 0.0 \n", "9 NaN \n", "\n", " CREDIT_TYPE_Unknown type of loan_count_norm \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 NaN \n", "4 0.0 \n", "5 0.0 \n", "6 0.0 \n", "7 0.0 \n", "8 0.0 \n", "9 NaN \n", "\n", "[10 rows x 106 columns]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train.iloc[:10, 123:]" ] }, { "cell_type": "markdown", "id": "0e1b1013", "metadata": {}, "source": [ "### Function to Handle Categorical Variables\n", "\n" ] }, { "cell_type": "code", "execution_count": 29, "id": "2c624e72", "metadata": {}, "outputs": [], "source": [ "def count_categorical(df, group_var, df_name):\n", " \"\"\"Computes counts and normalized counts for each observation\n", " of `group_var` of each unique category in every categorical variable\n", " \n", " Parameters\n", " --------\n", " df : dataframe \n", " The dataframe to calculate the value counts for.\n", " \n", " group_var : string\n", " The variable by which to group the dataframe. For each unique\n", " value of this variable, the final dataframe will have one row\n", " \n", " df_name : string\n", " Variable added to the front of column names to keep track of columns\n", "\n", " \n", " Return\n", " --------\n", " categorical : dataframe\n", " A dataframe with counts and normalized counts of each unique category in every categorical variable\n", " with one row for every unique value of the `group_var`.\n", " \n", " \"\"\"\n", " \n", " # Select the categorical columns\n", " categorical = pd.get_dummies(df.select_dtypes('object'))\n", "\n", " # Make sure to put the identifying id on the column\n", " categorical[group_var] = df[group_var]\n", "\n", " # Groupby the group var and calculate the sum and mean\n", " categorical = categorical.groupby(group_var).agg(['sum', 'mean'])\n", " \n", " column_names = []\n", " \n", " # Iterate through the columns in level 0\n", " for var in categorical.columns.levels[0]:\n", " # Iterate through the stats in level 1\n", " for stat in ['count', 'count_norm']:\n", " # Make a new column name\n", " column_names.append('%s_%s_%s' % (df_name, var, stat))\n", " \n", " categorical.columns = column_names\n", " \n", " return categorical" ] }, { "cell_type": "code", "execution_count": 30, "id": "bd0c2c81", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bureau_CREDIT_ACTIVE_Active_countbureau_CREDIT_ACTIVE_Active_count_normbureau_CREDIT_ACTIVE_Bad debt_countbureau_CREDIT_ACTIVE_Bad debt_count_normbureau_CREDIT_ACTIVE_Closed_countbureau_CREDIT_ACTIVE_Closed_count_normbureau_CREDIT_ACTIVE_Sold_countbureau_CREDIT_ACTIVE_Sold_count_normbureau_CREDIT_CURRENCY_currency 1_countbureau_CREDIT_CURRENCY_currency 1_count_norm...bureau_CREDIT_TYPE_Microloan_countbureau_CREDIT_TYPE_Microloan_count_normbureau_CREDIT_TYPE_Mobile operator loan_countbureau_CREDIT_TYPE_Mobile operator loan_count_normbureau_CREDIT_TYPE_Mortgage_countbureau_CREDIT_TYPE_Mortgage_count_normbureau_CREDIT_TYPE_Real estate loan_countbureau_CREDIT_TYPE_Real estate loan_count_normbureau_CREDIT_TYPE_Unknown type of loan_countbureau_CREDIT_TYPE_Unknown type of loan_count_norm
SK_ID_CURR
10000130.42857100.040.57142900.071.0...00.000.000.000.000.0
10000220.25000000.060.75000000.081.0...00.000.000.000.000.0
10000310.25000000.030.75000000.041.0...00.000.000.000.000.0
10000400.00000000.021.00000000.021.0...00.000.000.000.000.0
10000520.66666700.010.33333300.031.0...00.000.000.000.000.0
\n", "

5 rows × 46 columns

\n", "
" ], "text/plain": [ " bureau_CREDIT_ACTIVE_Active_count \\\n", "SK_ID_CURR \n", "100001 3 \n", "100002 2 \n", "100003 1 \n", "100004 0 \n", "100005 2 \n", "\n", " bureau_CREDIT_ACTIVE_Active_count_norm \\\n", "SK_ID_CURR \n", "100001 0.428571 \n", "100002 0.250000 \n", "100003 0.250000 \n", "100004 0.000000 \n", "100005 0.666667 \n", "\n", " bureau_CREDIT_ACTIVE_Bad debt_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_ACTIVE_Bad debt_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_ACTIVE_Closed_count \\\n", "SK_ID_CURR \n", "100001 4 \n", "100002 6 \n", "100003 3 \n", "100004 2 \n", "100005 1 \n", "\n", " bureau_CREDIT_ACTIVE_Closed_count_norm \\\n", "SK_ID_CURR \n", "100001 0.571429 \n", "100002 0.750000 \n", "100003 0.750000 \n", "100004 1.000000 \n", "100005 0.333333 \n", "\n", " bureau_CREDIT_ACTIVE_Sold_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_ACTIVE_Sold_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_CURRENCY_currency 1_count \\\n", "SK_ID_CURR \n", "100001 7 \n", "100002 8 \n", "100003 4 \n", "100004 2 \n", "100005 3 \n", "\n", " bureau_CREDIT_CURRENCY_currency 1_count_norm ... \\\n", "SK_ID_CURR ... \n", "100001 1.0 ... \n", "100002 1.0 ... \n", "100003 1.0 ... \n", "100004 1.0 ... \n", "100005 1.0 ... \n", "\n", " bureau_CREDIT_TYPE_Microloan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Microloan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Mobile operator loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Mobile operator loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Mortgage_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Mortgage_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Real estate loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Real estate loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Unknown type of loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Unknown type of loan_count_norm \n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", "[5 rows x 46 columns]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_counts = count_categorical(bureau, group_var = 'SK_ID_CURR', df_name = 'bureau')\n", "bureau_counts.head()" ] }, { "cell_type": "markdown", "id": "9144a104", "metadata": {}, "source": [ "## Applying Operations to another dataframe\n" ] }, { "cell_type": "code", "execution_count": 31, "id": "0f15705d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_BUREAUMONTHS_BALANCESTATUS
057154480C
15715448-1C
25715448-2C
35715448-3C
45715448-4C
\n", "
" ], "text/plain": [ " SK_ID_BUREAU MONTHS_BALANCE STATUS\n", "0 5715448 0 C\n", "1 5715448 -1 C\n", "2 5715448 -2 C\n", "3 5715448 -3 C\n", "4 5715448 -4 C" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_balance = pd.read_csv('./input/bureau_balance.csv')\n", "bureau_balance.head()" ] }, { "cell_type": "code", "execution_count": 32, "id": "7556c5b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bureau_balance_STATUS_0_countbureau_balance_STATUS_0_count_normbureau_balance_STATUS_1_countbureau_balance_STATUS_1_count_normbureau_balance_STATUS_2_countbureau_balance_STATUS_2_count_normbureau_balance_STATUS_3_countbureau_balance_STATUS_3_count_normbureau_balance_STATUS_4_countbureau_balance_STATUS_4_count_normbureau_balance_STATUS_5_countbureau_balance_STATUS_5_count_normbureau_balance_STATUS_C_countbureau_balance_STATUS_C_count_normbureau_balance_STATUS_X_countbureau_balance_STATUS_X_count_norm
SK_ID_BUREAU
500170900.00000000.000.000.000.000.0860.886598110.113402
500171050.06024100.000.000.000.000.0480.578313300.361446
500171130.75000000.000.000.000.000.000.00000010.250000
5001712100.52631600.000.000.000.000.090.47368400.000000
500171300.00000000.000.000.000.000.000.000000221.000000
\n", "
" ], "text/plain": [ " bureau_balance_STATUS_0_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 5 \n", "5001711 3 \n", "5001712 10 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_0_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.000000 \n", "5001710 0.060241 \n", "5001711 0.750000 \n", "5001712 0.526316 \n", "5001713 0.000000 \n", "\n", " bureau_balance_STATUS_1_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_1_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_2_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_2_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_3_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_3_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_4_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_4_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_5_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_5_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_C_count \\\n", "SK_ID_BUREAU \n", "5001709 86 \n", "5001710 48 \n", "5001711 0 \n", "5001712 9 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_C_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.886598 \n", "5001710 0.578313 \n", "5001711 0.000000 \n", "5001712 0.473684 \n", "5001713 0.000000 \n", "\n", " bureau_balance_STATUS_X_count \\\n", "SK_ID_BUREAU \n", "5001709 11 \n", "5001710 30 \n", "5001711 1 \n", "5001712 0 \n", "5001713 22 \n", "\n", " bureau_balance_STATUS_X_count_norm \n", "SK_ID_BUREAU \n", "5001709 0.113402 \n", "5001710 0.361446 \n", "5001711 0.250000 \n", "5001712 0.000000 \n", "5001713 1.000000 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of each type of status for each previous loan\n", "bureau_balance_counts = count_categorical(bureau_balance, group_var = 'SK_ID_BUREAU', df_name = 'bureau_balance')\n", "bureau_balance_counts.head()" ] }, { "cell_type": "markdown", "id": "ade196e0", "metadata": {}, "source": [ "이제 하나의 숫자 열을 처리할 수 있습니다. MONSTS_BALANCE 열에는 \"적용일을 기준으로 한 균형 월\"이 있습니다. 이 값이 숫자 변수만큼 반드시 중요한 것은 아닐 수 있으며, 향후 작업에서는 시간 변수로 고려할 수 있습니다. 지금은 이전과 동일한 집계 통계만 계산하면 됩니다.\n" ] }, { "cell_type": "code", "execution_count": 33, "id": "88b09dba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_BUREAUbureau_balance_MONTHS_BALANCE_countbureau_balance_MONTHS_BALANCE_meanbureau_balance_MONTHS_BALANCE_maxbureau_balance_MONTHS_BALANCE_minbureau_balance_MONTHS_BALANCE_sum
0500170997-48.00-96-4656
1500171083-41.00-82-3403
250017114-1.50-3-6
3500171219-9.00-18-171
4500171322-10.50-21-231
\n", "
" ], "text/plain": [ " SK_ID_BUREAU bureau_balance_MONTHS_BALANCE_count \\\n", "0 5001709 97 \n", "1 5001710 83 \n", "2 5001711 4 \n", "3 5001712 19 \n", "4 5001713 22 \n", "\n", " bureau_balance_MONTHS_BALANCE_mean bureau_balance_MONTHS_BALANCE_max \\\n", "0 -48.0 0 \n", "1 -41.0 0 \n", "2 -1.5 0 \n", "3 -9.0 0 \n", "4 -10.5 0 \n", "\n", " bureau_balance_MONTHS_BALANCE_min bureau_balance_MONTHS_BALANCE_sum \n", "0 -96 -4656 \n", "1 -82 -3403 \n", "2 -3 -6 \n", "3 -18 -171 \n", "4 -21 -231 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Calculate value count statistics for each `SK_ID_CURR` \n", "bureau_balance_agg = agg_numeric(bureau_balance, group_var = 'SK_ID_BUREAU', df_name = 'bureau_balance')\n", "bureau_balance_agg.head()" ] }, { "cell_type": "code", "execution_count": 34, "id": "75eaee43", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_BUREAUbureau_balance_MONTHS_BALANCE_countbureau_balance_MONTHS_BALANCE_meanbureau_balance_MONTHS_BALANCE_maxbureau_balance_MONTHS_BALANCE_minbureau_balance_MONTHS_BALANCE_sumbureau_balance_STATUS_0_countbureau_balance_STATUS_0_count_normbureau_balance_STATUS_1_countbureau_balance_STATUS_1_count_norm...bureau_balance_STATUS_3_count_normbureau_balance_STATUS_4_countbureau_balance_STATUS_4_count_normbureau_balance_STATUS_5_countbureau_balance_STATUS_5_count_normbureau_balance_STATUS_C_countbureau_balance_STATUS_C_count_normbureau_balance_STATUS_X_countbureau_balance_STATUS_X_count_normSK_ID_CURR
0500170997-48.00-96-465600.00000000.0...0.000.000.0860.886598110.113402NaN
1500171083-41.00-82-340350.06024100.0...0.000.000.0480.578313300.361446162368.0
250017114-1.50-3-630.75000000.0...0.000.000.000.00000010.250000162368.0
3500171219-9.00-18-171100.52631600.0...0.000.000.090.47368400.000000162368.0
4500171322-10.50-21-23100.00000000.0...0.000.000.000.000000221.000000150635.0
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " SK_ID_BUREAU bureau_balance_MONTHS_BALANCE_count \\\n", "0 5001709 97 \n", "1 5001710 83 \n", "2 5001711 4 \n", "3 5001712 19 \n", "4 5001713 22 \n", "\n", " bureau_balance_MONTHS_BALANCE_mean bureau_balance_MONTHS_BALANCE_max \\\n", "0 -48.0 0 \n", "1 -41.0 0 \n", "2 -1.5 0 \n", "3 -9.0 0 \n", "4 -10.5 0 \n", "\n", " bureau_balance_MONTHS_BALANCE_min bureau_balance_MONTHS_BALANCE_sum \\\n", "0 -96 -4656 \n", "1 -82 -3403 \n", "2 -3 -6 \n", "3 -18 -171 \n", "4 -21 -231 \n", "\n", " bureau_balance_STATUS_0_count bureau_balance_STATUS_0_count_norm \\\n", "0 0 0.000000 \n", "1 5 0.060241 \n", "2 3 0.750000 \n", "3 10 0.526316 \n", "4 0 0.000000 \n", "\n", " bureau_balance_STATUS_1_count bureau_balance_STATUS_1_count_norm ... \\\n", "0 0 0.0 ... \n", "1 0 0.0 ... \n", "2 0 0.0 ... \n", "3 0 0.0 ... \n", "4 0 0.0 ... \n", "\n", " bureau_balance_STATUS_3_count_norm bureau_balance_STATUS_4_count \\\n", "0 0.0 0 \n", "1 0.0 0 \n", "2 0.0 0 \n", "3 0.0 0 \n", "4 0.0 0 \n", "\n", " bureau_balance_STATUS_4_count_norm bureau_balance_STATUS_5_count \\\n", "0 0.0 0 \n", "1 0.0 0 \n", "2 0.0 0 \n", "3 0.0 0 \n", "4 0.0 0 \n", "\n", " bureau_balance_STATUS_5_count_norm bureau_balance_STATUS_C_count \\\n", "0 0.0 86 \n", "1 0.0 48 \n", "2 0.0 0 \n", "3 0.0 9 \n", "4 0.0 0 \n", "\n", " bureau_balance_STATUS_C_count_norm bureau_balance_STATUS_X_count \\\n", "0 0.886598 11 \n", "1 0.578313 30 \n", "2 0.000000 1 \n", "3 0.473684 0 \n", "4 0.000000 22 \n", "\n", " bureau_balance_STATUS_X_count_norm SK_ID_CURR \n", "0 0.113402 NaN \n", "1 0.361446 162368.0 \n", "2 0.250000 162368.0 \n", "3 0.000000 162368.0 \n", "4 1.000000 150635.0 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dataframe grouped by the loan\n", "bureau_by_loan = bureau_balance_agg.merge(bureau_balance_counts, right_index = True, left_on = 'SK_ID_BUREAU', how = 'outer')\n", "\n", "# Merge to include the SK_ID_CURR\n", "bureau_by_loan = bureau_by_loan.merge(bureau[['SK_ID_BUREAU', 'SK_ID_CURR']], on = 'SK_ID_BUREAU', how = 'left')\n", "\n", "bureau_by_loan.head()" ] }, { "cell_type": "code", "execution_count": 35, "id": "4310c3b3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRclient_bureau_balance_MONTHS_BALANCE_count_countclient_bureau_balance_MONTHS_BALANCE_count_meanclient_bureau_balance_MONTHS_BALANCE_count_maxclient_bureau_balance_MONTHS_BALANCE_count_minclient_bureau_balance_MONTHS_BALANCE_count_sumclient_bureau_balance_MONTHS_BALANCE_mean_countclient_bureau_balance_MONTHS_BALANCE_mean_meanclient_bureau_balance_MONTHS_BALANCE_mean_maxclient_bureau_balance_MONTHS_BALANCE_mean_min...client_bureau_balance_STATUS_X_count_countclient_bureau_balance_STATUS_X_count_meanclient_bureau_balance_STATUS_X_count_maxclient_bureau_balance_STATUS_X_count_minclient_bureau_balance_STATUS_X_count_sumclient_bureau_balance_STATUS_X_count_norm_countclient_bureau_balance_STATUS_X_count_norm_meanclient_bureau_balance_STATUS_X_count_norm_maxclient_bureau_balance_STATUS_X_count_norm_minclient_bureau_balance_STATUS_X_count_norm_sum
0100001.0724.5714295221727-11.785714-0.5-25.5...74.2857149030.070.2145900.5000000.01.502129
1100002.0813.7500002241108-21.875000-1.5-39.5...81.8750003015.080.1619320.5000000.01.295455
2100005.037.000000133213-3.000000-1.0-6.0...30.666667102.030.1367520.3333330.00.410256
3100010.0236.0000003636722-46.000000-19.5-72.5...20.000000000.020.0000000.0000000.00.000000
4100013.0457.50000069402304-28.250000-19.5-34.0...410.25000040041.040.2545451.0000000.01.018182
\n", "

5 rows × 106 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR client_bureau_balance_MONTHS_BALANCE_count_count \\\n", "0 100001.0 7 \n", "1 100002.0 8 \n", "2 100005.0 3 \n", "3 100010.0 2 \n", "4 100013.0 4 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_count_mean \\\n", "0 24.571429 \n", "1 13.750000 \n", "2 7.000000 \n", "3 36.000000 \n", "4 57.500000 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_count_max \\\n", "0 52 \n", "1 22 \n", "2 13 \n", "3 36 \n", "4 69 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_count_min \\\n", "0 2 \n", "1 4 \n", "2 3 \n", "3 36 \n", "4 40 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_count_sum \\\n", "0 172 \n", "1 110 \n", "2 21 \n", "3 72 \n", "4 230 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_mean_count \\\n", "0 7 \n", "1 8 \n", "2 3 \n", "3 2 \n", "4 4 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_mean_mean \\\n", "0 -11.785714 \n", "1 -21.875000 \n", "2 -3.000000 \n", "3 -46.000000 \n", "4 -28.250000 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_mean_max \\\n", "0 -0.5 \n", "1 -1.5 \n", "2 -1.0 \n", "3 -19.5 \n", "4 -19.5 \n", "\n", " client_bureau_balance_MONTHS_BALANCE_mean_min ... \\\n", "0 -25.5 ... \n", "1 -39.5 ... \n", "2 -6.0 ... \n", "3 -72.5 ... \n", "4 -34.0 ... \n", "\n", " client_bureau_balance_STATUS_X_count_count \\\n", "0 7 \n", "1 8 \n", "2 3 \n", "3 2 \n", "4 4 \n", "\n", " client_bureau_balance_STATUS_X_count_mean \\\n", "0 4.285714 \n", "1 1.875000 \n", "2 0.666667 \n", "3 0.000000 \n", "4 10.250000 \n", "\n", " client_bureau_balance_STATUS_X_count_max \\\n", "0 9 \n", "1 3 \n", "2 1 \n", "3 0 \n", "4 40 \n", "\n", " client_bureau_balance_STATUS_X_count_min \\\n", "0 0 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "\n", " client_bureau_balance_STATUS_X_count_sum \\\n", "0 30.0 \n", "1 15.0 \n", "2 2.0 \n", "3 0.0 \n", "4 41.0 \n", "\n", " client_bureau_balance_STATUS_X_count_norm_count \\\n", "0 7 \n", "1 8 \n", "2 3 \n", "3 2 \n", "4 4 \n", "\n", " client_bureau_balance_STATUS_X_count_norm_mean \\\n", "0 0.214590 \n", "1 0.161932 \n", "2 0.136752 \n", "3 0.000000 \n", "4 0.254545 \n", "\n", " client_bureau_balance_STATUS_X_count_norm_max \\\n", "0 0.500000 \n", "1 0.500000 \n", "2 0.333333 \n", "3 0.000000 \n", "4 1.000000 \n", "\n", " client_bureau_balance_STATUS_X_count_norm_min \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 0.0 \n", "\n", " client_bureau_balance_STATUS_X_count_norm_sum \n", "0 1.502129 \n", "1 1.295455 \n", "2 0.410256 \n", "3 0.000000 \n", "4 1.018182 \n", "\n", "[5 rows x 106 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_balance_by_client = agg_numeric(bureau_by_loan.drop(columns = ['SK_ID_BUREAU']), group_var = 'SK_ID_CURR', df_name = 'client')\n", "bureau_balance_by_client.head()" ] }, { "cell_type": "markdown", "id": "4b4b3c7c", "metadata": {}, "source": [ "# Putting the Functions Together\n" ] }, { "cell_type": "code", "execution_count": 36, "id": "8085bb12", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7409" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Free up memory by deleting old objects\n", "import gc\n", "gc.enable()\n", "del train, bureau, bureau_balance, bureau_agg, bureau_agg_new, bureau_balance_agg, bureau_balance_counts, bureau_by_loan, bureau_balance_by_client, bureau_counts\n", "gc.collect()" ] }, { "cell_type": "code", "execution_count": 37, "id": "8e8633cc", "metadata": {}, "outputs": [], "source": [ "# Read in new copies of all the dataframes\n", "train = pd.read_csv('./input/application_train.csv') # (307511, 122), (307511, 106), (307511, 16)\n", "bureau = pd.read_csv('./input/bureau.csv') # (1716428, 17), (1716428, 14), (1716428, 3)\n", "bureau_balance = pd.read_csv('./input/bureau_balance.csv') # (27299925, 3), (27299925, 2), (27299925, 1)" ] }, { "cell_type": "code", "execution_count": 38, "id": "7ac11416", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRSK_ID_BUREAUCREDIT_ACTIVECREDIT_CURRENCYDAYS_CREDITCREDIT_DAY_OVERDUEDAYS_CREDIT_ENDDATEDAYS_ENDDATE_FACTAMT_CREDIT_MAX_OVERDUECNT_CREDIT_PROLONGAMT_CREDIT_SUMAMT_CREDIT_SUM_DEBTAMT_CREDIT_SUM_LIMITAMT_CREDIT_SUM_OVERDUECREDIT_TYPEDAYS_CREDIT_UPDATEAMT_ANNUITY
17164232593555057750Activecurrency 1-440-30.0NaN0.0011250.0011250.00.00.0Microloan-19NaN
17164241000445057754Closedcurrency 1-26480-2433.0-2493.05476.5038130.840.00.00.0Consumer credit-2493NaN
17164251000445057762Closedcurrency 1-18090-1628.0-970.0NaN015570.00NaNNaN0.0Consumer credit-967NaN
17164262468295057770Closedcurrency 1-18780-1513.0-1513.0NaN036000.000.00.00.0Consumer credit-1508NaN
17164272468295057778Closedcurrency 1-4630NaN-387.0NaN022500.000.0NaN0.0Microloan-387NaN
\n", "
" ], "text/plain": [ " SK_ID_CURR SK_ID_BUREAU CREDIT_ACTIVE CREDIT_CURRENCY DAYS_CREDIT \\\n", "1716423 259355 5057750 Active currency 1 -44 \n", "1716424 100044 5057754 Closed currency 1 -2648 \n", "1716425 100044 5057762 Closed currency 1 -1809 \n", "1716426 246829 5057770 Closed currency 1 -1878 \n", "1716427 246829 5057778 Closed currency 1 -463 \n", "\n", " CREDIT_DAY_OVERDUE DAYS_CREDIT_ENDDATE DAYS_ENDDATE_FACT \\\n", "1716423 0 -30.0 NaN \n", "1716424 0 -2433.0 -2493.0 \n", "1716425 0 -1628.0 -970.0 \n", "1716426 0 -1513.0 -1513.0 \n", "1716427 0 NaN -387.0 \n", "\n", " AMT_CREDIT_MAX_OVERDUE CNT_CREDIT_PROLONG AMT_CREDIT_SUM \\\n", "1716423 0.0 0 11250.00 \n", "1716424 5476.5 0 38130.84 \n", "1716425 NaN 0 15570.00 \n", "1716426 NaN 0 36000.00 \n", "1716427 NaN 0 22500.00 \n", "\n", " AMT_CREDIT_SUM_DEBT AMT_CREDIT_SUM_LIMIT AMT_CREDIT_SUM_OVERDUE \\\n", "1716423 11250.0 0.0 0.0 \n", "1716424 0.0 0.0 0.0 \n", "1716425 NaN NaN 0.0 \n", "1716426 0.0 0.0 0.0 \n", "1716427 0.0 NaN 0.0 \n", "\n", " CREDIT_TYPE DAYS_CREDIT_UPDATE AMT_ANNUITY \n", "1716423 Microloan -19 NaN \n", "1716424 Consumer credit -2493 NaN \n", "1716425 Consumer credit -967 NaN \n", "1716426 Consumer credit -1508 NaN \n", "1716427 Microloan -387 NaN " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau.tail()" ] }, { "cell_type": "markdown", "id": "71a05e52", "metadata": {}, "source": [ "### Counts of Bureau Dataframe(Bureau:숫자형)\n" ] }, { "cell_type": "code", "execution_count": 39, "id": "abc3b75b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bureau_CREDIT_ACTIVE_Active_countbureau_CREDIT_ACTIVE_Active_count_normbureau_CREDIT_ACTIVE_Bad debt_countbureau_CREDIT_ACTIVE_Bad debt_count_normbureau_CREDIT_ACTIVE_Closed_countbureau_CREDIT_ACTIVE_Closed_count_normbureau_CREDIT_ACTIVE_Sold_countbureau_CREDIT_ACTIVE_Sold_count_normbureau_CREDIT_CURRENCY_currency 1_countbureau_CREDIT_CURRENCY_currency 1_count_norm...bureau_CREDIT_TYPE_Microloan_countbureau_CREDIT_TYPE_Microloan_count_normbureau_CREDIT_TYPE_Mobile operator loan_countbureau_CREDIT_TYPE_Mobile operator loan_count_normbureau_CREDIT_TYPE_Mortgage_countbureau_CREDIT_TYPE_Mortgage_count_normbureau_CREDIT_TYPE_Real estate loan_countbureau_CREDIT_TYPE_Real estate loan_count_normbureau_CREDIT_TYPE_Unknown type of loan_countbureau_CREDIT_TYPE_Unknown type of loan_count_norm
SK_ID_CURR
10000130.42857100.040.57142900.071.0...00.000.000.000.000.0
10000220.25000000.060.75000000.081.0...00.000.000.000.000.0
10000310.25000000.030.75000000.041.0...00.000.000.000.000.0
10000400.00000000.021.00000000.021.0...00.000.000.000.000.0
10000520.66666700.010.33333300.031.0...00.000.000.000.000.0
\n", "

5 rows × 46 columns

\n", "
" ], "text/plain": [ " bureau_CREDIT_ACTIVE_Active_count \\\n", "SK_ID_CURR \n", "100001 3 \n", "100002 2 \n", "100003 1 \n", "100004 0 \n", "100005 2 \n", "\n", " bureau_CREDIT_ACTIVE_Active_count_norm \\\n", "SK_ID_CURR \n", "100001 0.428571 \n", "100002 0.250000 \n", "100003 0.250000 \n", "100004 0.000000 \n", "100005 0.666667 \n", "\n", " bureau_CREDIT_ACTIVE_Bad debt_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_ACTIVE_Bad debt_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_ACTIVE_Closed_count \\\n", "SK_ID_CURR \n", "100001 4 \n", "100002 6 \n", "100003 3 \n", "100004 2 \n", "100005 1 \n", "\n", " bureau_CREDIT_ACTIVE_Closed_count_norm \\\n", "SK_ID_CURR \n", "100001 0.571429 \n", "100002 0.750000 \n", "100003 0.750000 \n", "100004 1.000000 \n", "100005 0.333333 \n", "\n", " bureau_CREDIT_ACTIVE_Sold_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_ACTIVE_Sold_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_CURRENCY_currency 1_count \\\n", "SK_ID_CURR \n", "100001 7 \n", "100002 8 \n", "100003 4 \n", "100004 2 \n", "100005 3 \n", "\n", " bureau_CREDIT_CURRENCY_currency 1_count_norm ... \\\n", "SK_ID_CURR ... \n", "100001 1.0 ... \n", "100002 1.0 ... \n", "100003 1.0 ... \n", "100004 1.0 ... \n", "100005 1.0 ... \n", "\n", " bureau_CREDIT_TYPE_Microloan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Microloan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Mobile operator loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Mobile operator loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Mortgage_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Mortgage_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Real estate loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Real estate loan_count_norm \\\n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", " bureau_CREDIT_TYPE_Unknown type of loan_count \\\n", "SK_ID_CURR \n", "100001 0 \n", "100002 0 \n", "100003 0 \n", "100004 0 \n", "100005 0 \n", "\n", " bureau_CREDIT_TYPE_Unknown type of loan_count_norm \n", "SK_ID_CURR \n", "100001 0.0 \n", "100002 0.0 \n", "100003 0.0 \n", "100004 0.0 \n", "100005 0.0 \n", "\n", "[5 rows x 46 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_counts = count_categorical(bureau, group_var = 'SK_ID_CURR', df_name = 'bureau')\n", "bureau_counts.head()" ] }, { "cell_type": "markdown", "id": "44d565fb", "metadata": {}, "source": [ "### Aggregated Stats of Bureau Dataframe(Bureau:카테고리형)\n" ] }, { "cell_type": "code", "execution_count": 40, "id": "b61d4a88", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_CURRbureau_DAYS_CREDIT_countbureau_DAYS_CREDIT_meanbureau_DAYS_CREDIT_maxbureau_DAYS_CREDIT_minbureau_DAYS_CREDIT_sumbureau_CREDIT_DAY_OVERDUE_countbureau_CREDIT_DAY_OVERDUE_meanbureau_CREDIT_DAY_OVERDUE_maxbureau_CREDIT_DAY_OVERDUE_min...bureau_DAYS_CREDIT_UPDATE_countbureau_DAYS_CREDIT_UPDATE_meanbureau_DAYS_CREDIT_UPDATE_maxbureau_DAYS_CREDIT_UPDATE_minbureau_DAYS_CREDIT_UPDATE_sumbureau_AMT_ANNUITY_countbureau_AMT_ANNUITY_meanbureau_AMT_ANNUITY_maxbureau_AMT_ANNUITY_minbureau_AMT_ANNUITY_sum
01000017-735.000000-49-1572-514570.000...7-93.142857-6-155-65273545.35714310822.50.024817.5
11000028-874.000000-103-1437-699280.000...8-499.875000-7-1185-399970.0000000.00.00.0
21000034-1400.750000-606-2586-560340.000...4-816.000000-43-2131-32640NaNNaNNaN0.0
31000042-867.000000-408-1326-173420.000...2-532.000000-382-682-10640NaNNaNNaN0.0
41000053-190.666667-62-373-57230.000...3-54.333333-11-121-16331420.5000004261.50.04261.5
\n", "

5 rows × 61 columns

\n", "
" ], "text/plain": [ " SK_ID_CURR bureau_DAYS_CREDIT_count bureau_DAYS_CREDIT_mean \\\n", "0 100001 7 -735.000000 \n", "1 100002 8 -874.000000 \n", "2 100003 4 -1400.750000 \n", "3 100004 2 -867.000000 \n", "4 100005 3 -190.666667 \n", "\n", " bureau_DAYS_CREDIT_max bureau_DAYS_CREDIT_min bureau_DAYS_CREDIT_sum \\\n", "0 -49 -1572 -5145 \n", "1 -103 -1437 -6992 \n", "2 -606 -2586 -5603 \n", "3 -408 -1326 -1734 \n", "4 -62 -373 -572 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_count bureau_CREDIT_DAY_OVERDUE_mean \\\n", "0 7 0.0 \n", "1 8 0.0 \n", "2 4 0.0 \n", "3 2 0.0 \n", "4 3 0.0 \n", "\n", " bureau_CREDIT_DAY_OVERDUE_max bureau_CREDIT_DAY_OVERDUE_min ... \\\n", "0 0 0 ... \n", "1 0 0 ... \n", "2 0 0 ... \n", "3 0 0 ... \n", "4 0 0 ... \n", "\n", " bureau_DAYS_CREDIT_UPDATE_count bureau_DAYS_CREDIT_UPDATE_mean \\\n", "0 7 -93.142857 \n", "1 8 -499.875000 \n", "2 4 -816.000000 \n", "3 2 -532.000000 \n", "4 3 -54.333333 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_max bureau_DAYS_CREDIT_UPDATE_min \\\n", "0 -6 -155 \n", "1 -7 -1185 \n", "2 -43 -2131 \n", "3 -382 -682 \n", "4 -11 -121 \n", "\n", " bureau_DAYS_CREDIT_UPDATE_sum bureau_AMT_ANNUITY_count \\\n", "0 -652 7 \n", "1 -3999 7 \n", "2 -3264 0 \n", "3 -1064 0 \n", "4 -163 3 \n", "\n", " bureau_AMT_ANNUITY_mean bureau_AMT_ANNUITY_max bureau_AMT_ANNUITY_min \\\n", "0 3545.357143 10822.5 0.0 \n", "1 0.000000 0.0 0.0 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 1420.500000 4261.5 0.0 \n", "\n", " bureau_AMT_ANNUITY_sum \n", "0 24817.5 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 4261.5 \n", "\n", "[5 rows x 61 columns]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_agg = agg_numeric(bureau.drop(columns = ['SK_ID_BUREAU']), group_var = 'SK_ID_CURR', df_name = 'bureau')\n", "bureau_agg.head()" ] }, { "cell_type": "markdown", "id": "ef13358d", "metadata": {}, "source": [ "### Value counts of Bureau Balance dataframe by loan(Bureau Balance:숫자형)\n" ] }, { "cell_type": "code", "execution_count": 41, "id": "4cd87fc7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bureau_balance_STATUS_0_countbureau_balance_STATUS_0_count_normbureau_balance_STATUS_1_countbureau_balance_STATUS_1_count_normbureau_balance_STATUS_2_countbureau_balance_STATUS_2_count_normbureau_balance_STATUS_3_countbureau_balance_STATUS_3_count_normbureau_balance_STATUS_4_countbureau_balance_STATUS_4_count_normbureau_balance_STATUS_5_countbureau_balance_STATUS_5_count_normbureau_balance_STATUS_C_countbureau_balance_STATUS_C_count_normbureau_balance_STATUS_X_countbureau_balance_STATUS_X_count_norm
SK_ID_BUREAU
500170900.00000000.000.000.000.000.0860.886598110.113402
500171050.06024100.000.000.000.000.0480.578313300.361446
500171130.75000000.000.000.000.000.000.00000010.250000
5001712100.52631600.000.000.000.000.090.47368400.000000
500171300.00000000.000.000.000.000.000.000000221.000000
\n", "
" ], "text/plain": [ " bureau_balance_STATUS_0_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 5 \n", "5001711 3 \n", "5001712 10 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_0_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.000000 \n", "5001710 0.060241 \n", "5001711 0.750000 \n", "5001712 0.526316 \n", "5001713 0.000000 \n", "\n", " bureau_balance_STATUS_1_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_1_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_2_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_2_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_3_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_3_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_4_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_4_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_5_count \\\n", "SK_ID_BUREAU \n", "5001709 0 \n", "5001710 0 \n", "5001711 0 \n", "5001712 0 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_5_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.0 \n", "5001710 0.0 \n", "5001711 0.0 \n", "5001712 0.0 \n", "5001713 0.0 \n", "\n", " bureau_balance_STATUS_C_count \\\n", "SK_ID_BUREAU \n", "5001709 86 \n", "5001710 48 \n", "5001711 0 \n", "5001712 9 \n", "5001713 0 \n", "\n", " bureau_balance_STATUS_C_count_norm \\\n", "SK_ID_BUREAU \n", "5001709 0.886598 \n", "5001710 0.578313 \n", "5001711 0.000000 \n", "5001712 0.473684 \n", "5001713 0.000000 \n", "\n", " bureau_balance_STATUS_X_count \\\n", "SK_ID_BUREAU \n", "5001709 11 \n", "5001710 30 \n", "5001711 1 \n", "5001712 0 \n", "5001713 22 \n", "\n", " bureau_balance_STATUS_X_count_norm \n", "SK_ID_BUREAU \n", "5001709 0.113402 \n", "5001710 0.361446 \n", "5001711 0.250000 \n", "5001712 0.000000 \n", "5001713 1.000000 " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_balance_counts = count_categorical(bureau_balance, group_var = 'SK_ID_BUREAU', df_name = 'bureau_balance')\n", "bureau_balance_counts.head()" ] }, { "cell_type": "markdown", "id": "0a4ea5ab", "metadata": {}, "source": [ "### Aggregated stats of Bureau Balance dataframe by loan(Bureau Balance:카테고리형)\n" ] }, { "cell_type": "code", "execution_count": 42, "id": "85d1ec7c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_BUREAUbureau_balance_MONTHS_BALANCE_countbureau_balance_MONTHS_BALANCE_meanbureau_balance_MONTHS_BALANCE_maxbureau_balance_MONTHS_BALANCE_minbureau_balance_MONTHS_BALANCE_sum
0500170997-48.00-96-4656
1500171083-41.00-82-3403
250017114-1.50-3-6
3500171219-9.00-18-171
4500171322-10.50-21-231
\n", "
" ], "text/plain": [ " SK_ID_BUREAU bureau_balance_MONTHS_BALANCE_count \\\n", "0 5001709 97 \n", "1 5001710 83 \n", "2 5001711 4 \n", "3 5001712 19 \n", "4 5001713 22 \n", "\n", " bureau_balance_MONTHS_BALANCE_mean bureau_balance_MONTHS_BALANCE_max \\\n", "0 -48.0 0 \n", "1 -41.0 0 \n", "2 -1.5 0 \n", "3 -9.0 0 \n", "4 -10.5 0 \n", "\n", " bureau_balance_MONTHS_BALANCE_min bureau_balance_MONTHS_BALANCE_sum \n", "0 -96 -4656 \n", "1 -82 -3403 \n", "2 -3 -6 \n", "3 -18 -171 \n", "4 -21 -231 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_balance_agg = agg_numeric(bureau_balance, group_var = 'SK_ID_BUREAU', df_name = 'bureau_balance')\n", "bureau_balance_agg.head()" ] }, { "cell_type": "markdown", "id": "236bde5b", "metadata": {}, "source": [ "### Aggregated Stats of Bureau Balance by Client\n" ] }, { "cell_type": "code", "execution_count": 43, "id": "82aa6509", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SK_ID_BUREAUbureau_balance_MONTHS_BALANCE_countbureau_balance_MONTHS_BALANCE_meanbureau_balance_MONTHS_BALANCE_maxbureau_balance_MONTHS_BALANCE_minbureau_balance_MONTHS_BALANCE_sumbureau_balance_STATUS_0_countbureau_balance_STATUS_0_count_normbureau_balance_STATUS_1_countbureau_balance_STATUS_1_count_norm...bureau_balance_STATUS_3_countbureau_balance_STATUS_3_count_normbureau_balance_STATUS_4_countbureau_balance_STATUS_4_count_normbureau_balance_STATUS_5_countbureau_balance_STATUS_5_count_normbureau_balance_STATUS_C_countbureau_balance_STATUS_C_count_normbureau_balance_STATUS_X_countbureau_balance_STATUS_X_count_norm
0500170997-48.00-96-465600.00000000.0...00.000.000.0860.886598110.113402
1500171083-41.00-82-340350.06024100.0...00.000.000.0480.578313300.361446
250017114-1.50-3-630.75000000.0...00.000.000.000.00000010.250000
3500171219-9.00-18-171100.52631600.0...00.000.000.090.47368400.000000
4500171322-10.50-21-23100.00000000.0...00.000.000.000.000000221.000000
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " SK_ID_BUREAU bureau_balance_MONTHS_BALANCE_count \\\n", "0 5001709 97 \n", "1 5001710 83 \n", "2 5001711 4 \n", "3 5001712 19 \n", "4 5001713 22 \n", "\n", " bureau_balance_MONTHS_BALANCE_mean bureau_balance_MONTHS_BALANCE_max \\\n", "0 -48.0 0 \n", "1 -41.0 0 \n", "2 -1.5 0 \n", "3 -9.0 0 \n", "4 -10.5 0 \n", "\n", " bureau_balance_MONTHS_BALANCE_min bureau_balance_MONTHS_BALANCE_sum \\\n", "0 -96 -4656 \n", "1 -82 -3403 \n", "2 -3 -6 \n", "3 -18 -171 \n", "4 -21 -231 \n", "\n", " bureau_balance_STATUS_0_count bureau_balance_STATUS_0_count_norm \\\n", "0 0 0.000000 \n", "1 5 0.060241 \n", "2 3 0.750000 \n", "3 10 0.526316 \n", "4 0 0.000000 \n", "\n", " bureau_balance_STATUS_1_count bureau_balance_STATUS_1_count_norm ... \\\n", "0 0 0.0 ... \n", "1 0 0.0 ... \n", "2 0 0.0 ... \n", "3 0 0.0 ... \n", "4 0 0.0 ... \n", "\n", " bureau_balance_STATUS_3_count bureau_balance_STATUS_3_count_norm \\\n", "0 0 0.0 \n", "1 0 0.0 \n", "2 0 0.0 \n", "3 0 0.0 \n", "4 0 0.0 \n", "\n", " bureau_balance_STATUS_4_count bureau_balance_STATUS_4_count_norm \\\n", "0 0 0.0 \n", "1 0 0.0 \n", "2 0 0.0 \n", "3 0 0.0 \n", "4 0 0.0 \n", "\n", " bureau_balance_STATUS_5_count bureau_balance_STATUS_5_count_norm \\\n", "0 0 0.0 \n", "1 0 0.0 \n", "2 0 0.0 \n", "3 0 0.0 \n", "4 0 0.0 \n", "\n", " bureau_balance_STATUS_C_count bureau_balance_STATUS_C_count_norm \\\n", "0 86 0.886598 \n", "1 48 0.578313 \n", "2 0 0.000000 \n", "3 9 0.473684 \n", "4 0 0.000000 \n", "\n", " bureau_balance_STATUS_X_count bureau_balance_STATUS_X_count_norm \n", "0 11 0.113402 \n", "1 30 0.361446 \n", "2 1 0.250000 \n", "3 0 0.000000 \n", "4 22 1.000000 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bureau_by_loan = bureau_balance_agg.merge(bureau_balance_counts, right_index = True, left_on = 'SK_ID_BUREAU', how = 'outer')\n", "bureau_by_loan.head()" ] }, { "cell_type": "code", "execution_count": 44, "id": "ba7b85a0", "metadata": {}, "outputs": [], "source": [ "bureau_by_loan = bureau_balance_agg.merge(bureau_balance_counts, right_index = True, left_on = 'SK_ID_BUREAU', how = 'outer')\n", "\n", "bureau_by_loan = bureau[['SK_ID_BUREAU', 'SK_ID_CURR']].merge(bureau_by_loan, on = 'SK_ID_BUREAU', how = 'left')\n", "\n", "bureau_balance_by_client = agg_numeric(bureau_by_loan.drop(columns = ['SK_ID_BUREAU']), group_var = 'SK_ID_CURR', df_name = 'client')" ] }, { "cell_type": "markdown", "id": "2d797493", "metadata": {}, "source": [ "# Insert Computed Features into Training Data\n" ] }, { "cell_type": "code", "execution_count": 45, "id": "8b1074a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original Number of Features: 122\n" ] } ], "source": [ "original_features = list(train.columns)\n", "print('Original Number of Features: ', len(original_features))" ] }, { "cell_type": "code", "execution_count": 46, "id": "628208ca", "metadata": {}, "outputs": [], "source": [ "# Merge with the value counts of bureau\n", "train = train.merge(bureau_counts, on = 'SK_ID_CURR', how = 'left')\n", "\n", "# Merge with the stats of bureau\n", "train = train.merge(bureau_agg, on = 'SK_ID_CURR', how = 'left')\n", "\n", "# Merge with the monthly information grouped by client\n", "train = train.merge(bureau_balance_by_client, on = 'SK_ID_CURR', how = 'left')" ] }, { "cell_type": "code", "execution_count": 47, "id": "0f1bab49", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of features using previous loans from other institutions data: 333\n" ] } ], "source": [ "new_features = list(train.columns)\n", "print('Number of features using previous loans from other institutions data: ', len(new_features))" ] }, { "cell_type": "markdown", "id": "3931e10e", "metadata": {}, "source": [ "# 3. Feature Engineering Outcomes\n", "\n", "그 모든 작업이 끝난 후, 이제 우리가 만든 변수를 살펴보려고 합니다. 결측값의 백분율, 목표값과의 변수 상관관계 및 다른 변수와의 상관관계를 확인할 수 있습니다. 변수들 간의 상관관계는 우리가 공선형 변수들, 즉 서로 높은 상관관계가 있는 변수들을 가지고 있는지를 보여줄 수 있습니다. 두 변수가 모두 있으면 중복될 수 있기 때문에 한 쌍의 동일 선형 변수를 제거하고자 하는 경우가 많습니다. 결측값의 백분율을 사용하여 존재하지 않는 대부분의 값을 가진 형상을 제거할 수도 있습니다. 피쳐 수를 줄이면 모델이 교육 중에 학습하는 데 도움이 되고 테스트 데이터로 더 잘 일반화할 수 있기 때문에 피쳐 선택은 앞으로 중요한 초점이 될 것입니다. \"차원의 곡선\"은 너무 많은 피쳐(차원의 너무 높음)로 인해 발생하는 문제에 주어진 이름입니다. 변수의 수가 증가함에 따라 이러한 변수와 목표값 사이의 관계를 학습하는 데 필요한 데이터 점의 수는 기하급수적으로 증가합니다.\n", "\n", "\n", "피쳐 선택은 모델이 학습하고 테스트 세트에 더 잘 일반화하는 데 도움이 되는 변수를 제거하는 프로세스입니다. 유용한 변수는 보존하면서 쓸모없는/중복된 변수는 제거하는 것이 목적입니다. 이 프로세스에 사용할 수 있는 도구는 여러 가지가 있지만 이 노트북에서는 결측값 비율이 높은 열과 서로 상관 관계가 높은 변수를 제거하는 방법을 고수할 것입니다. 나중에 Gradient Boosting Machine 또는 Random Forest와 같은 모델에서 반환된 피쳐 가져오기를 사용하여 피쳐 선택을 수행할 수 있습니다.\n" ] }, { "cell_type": "markdown", "id": "204b923b", "metadata": {}, "source": [ "## 3-1) Missing Values\n", "중요한 고려 사항은 데이터 프레임의 결측치입니다. 결측값이 너무 많은 열을 삭제해야 할 수 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 48, "id": "00d00190", "metadata": {}, "outputs": [], "source": [ "def missing_values_table(df):\n", " mis_val = df.isnull().sum()\n", " \n", " mis_val_percent = 100 * df.isnull().sum() / len(df)\n", " \n", " mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)\n", " \n", " mis_val_table_ren_columns = mis_val_table.rename(\n", " columns = {0 : 'Missing Values', 1 : '% of Total Values'})\n", " \n", " mis_val_table_ren_columns = mis_val_table_ren_columns[\n", " mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(\n", " '% of Total Values', ascending=False).round(1)\n", " \n", " print (\"Your selected dataframe has \" + str(df.shape[1]) + \" columns.\\n\" \n", " \"There are \" + str(mis_val_table_ren_columns.shape[0]) +\n", " \" columns that have missing values.\")\n", " \n", " return mis_val_table_ren_columns" ] }, { "cell_type": "code", "execution_count": 49, "id": "ad3a6a89", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your selected dataframe has 333 columns.\n", "There are 278 columns that have missing values.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Missing Values% of Total Values
bureau_AMT_ANNUITY_min22750274.0
bureau_AMT_ANNUITY_max22750274.0
bureau_AMT_ANNUITY_mean22750274.0
client_bureau_balance_STATUS_4_count_min21528070.0
client_bureau_balance_STATUS_3_count_norm_mean21528070.0
client_bureau_balance_MONTHS_BALANCE_count_min21528070.0
client_bureau_balance_STATUS_4_count_max21528070.0
client_bureau_balance_STATUS_4_count_mean21528070.0
client_bureau_balance_STATUS_3_count_norm_min21528070.0
client_bureau_balance_STATUS_3_count_norm_max21528070.0
\n", "
" ], "text/plain": [ " Missing Values \\\n", "bureau_AMT_ANNUITY_min 227502 \n", "bureau_AMT_ANNUITY_max 227502 \n", "bureau_AMT_ANNUITY_mean 227502 \n", "client_bureau_balance_STATUS_4_count_min 215280 \n", "client_bureau_balance_STATUS_3_count_norm_mean 215280 \n", "client_bureau_balance_MONTHS_BALANCE_count_min 215280 \n", "client_bureau_balance_STATUS_4_count_max 215280 \n", "client_bureau_balance_STATUS_4_count_mean 215280 \n", "client_bureau_balance_STATUS_3_count_norm_min 215280 \n", "client_bureau_balance_STATUS_3_count_norm_max 215280 \n", "\n", " % of Total Values \n", "bureau_AMT_ANNUITY_min 74.0 \n", "bureau_AMT_ANNUITY_max 74.0 \n", "bureau_AMT_ANNUITY_mean 74.0 \n", "client_bureau_balance_STATUS_4_count_min 70.0 \n", "client_bureau_balance_STATUS_3_count_norm_mean 70.0 \n", "client_bureau_balance_MONTHS_BALANCE_count_min 70.0 \n", "client_bureau_balance_STATUS_4_count_max 70.0 \n", "client_bureau_balance_STATUS_4_count_mean 70.0 \n", "client_bureau_balance_STATUS_3_count_norm_min 70.0 \n", "client_bureau_balance_STATUS_3_count_norm_max 70.0 " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_train = missing_values_table(train)\n", "missing_train.head(10)" ] }, { "cell_type": "code", "execution_count": 50, "id": "023d0f85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_train_vars = list(missing_train.index[missing_train['% of Total Values'] > 90])\n", "len(missing_train_vars)" ] }, { "cell_type": "markdown", "id": "16549e1a", "metadata": {}, "source": [ "### 3-1-1) Calculate Information for Testing Data\n" ] }, { "cell_type": "code", "execution_count": 51, "id": "0531f6aa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape of Testing Data: (48744, 332)\n" ] } ], "source": [ "test = pd.read_csv('./input/application_test.csv')\n", "\n", "test = test.merge(bureau_counts, on = 'SK_ID_CURR', how = 'left')\n", "test = test.merge(bureau_agg, on = 'SK_ID_CURR', how = 'left')\n", "\n", "test = test.merge(bureau_balance_by_client, on = 'SK_ID_CURR', how = 'left')\n", "print('Shape of Testing Data: ', test.shape)" ] }, { "cell_type": "markdown", "id": "e9dc3c2e", "metadata": {}, "source": [ "테스트 및 교육 데이터 프레임을 정렬해야 합니다. 즉, 동일한 열을 가지도록 열을 일치시켜야 합니다. 여기서 문제가 되지는 않지만 변수를 한 번 핫 인코딩할 때는 데이터 프레임이 동일한 열을 가지도록 정렬해야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 52, "id": "27e110fb", "metadata": {}, "outputs": [], "source": [ "train_labels = train['TARGET']\n", "\n", "# Align the dataframes, this will remove the 'TARGET' column\n", "train, test = train.align(test, join = 'inner', axis = 1)\n", "\n", "train['TARGET'] = train_labels" ] }, { "cell_type": "code", "execution_count": 53, "id": "49beb245", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Data Shape: (307511, 333)\n", "Testing Data Shape: (48744, 332)\n" ] } ], "source": [ "print('Training Data Shape: ', train.shape)\n", "print('Testing Data Shape: ', test.shape)" ] }, { "cell_type": "markdown", "id": "1e5a3c2a", "metadata": {}, "source": [ "이제 데이터 프레임에 동일한 열이 있습니다(교육 데이터의 TARGET 열 제외). \n", "즉, 교육 및 테스트 데이터 프레임 모두에서 동일한 열을 확인해야 하는 기계 학습 모델에서 사용할 수 있습니다.\n", "이제 삭제해야 할 열을 파악할 수 있도록 검정 데이터에서 결측값의 백분율을 살펴보겠습니다.\n" ] }, { "cell_type": "code", "execution_count": 54, "id": "0cba5928", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your selected dataframe has 332 columns.\n", "There are 275 columns that have missing values.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Missing Values% of Total Values
COMMONAREA_MEDI3349568.7
COMMONAREA_MODE3349568.7
COMMONAREA_AVG3349568.7
NONLIVINGAPARTMENTS_MEDI3334768.4
NONLIVINGAPARTMENTS_AVG3334768.4
NONLIVINGAPARTMENTS_MODE3334768.4
FONDKAPREMONT_MODE3279767.3
LIVINGAPARTMENTS_MEDI3278067.2
LIVINGAPARTMENTS_MODE3278067.2
LIVINGAPARTMENTS_AVG3278067.2
\n", "
" ], "text/plain": [ " Missing Values % of Total Values\n", "COMMONAREA_MEDI 33495 68.7\n", "COMMONAREA_MODE 33495 68.7\n", "COMMONAREA_AVG 33495 68.7\n", "NONLIVINGAPARTMENTS_MEDI 33347 68.4\n", "NONLIVINGAPARTMENTS_AVG 33347 68.4\n", "NONLIVINGAPARTMENTS_MODE 33347 68.4\n", "FONDKAPREMONT_MODE 32797 67.3\n", "LIVINGAPARTMENTS_MEDI 32780 67.2\n", "LIVINGAPARTMENTS_MODE 32780 67.2\n", "LIVINGAPARTMENTS_AVG 32780 67.2" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_test = missing_values_table(test)\n", "missing_test.head(10)" ] }, { "cell_type": "code", "execution_count": 55, "id": "cdd84a4f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_test_vars = list(missing_test.index[missing_test['% of Total Values'] > 90])\n", "len(missing_test_vars)" ] }, { "cell_type": "code", "execution_count": 56, "id": "62e7cfba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 0 columns with more than 90% missing in either the training or testing data.\n" ] } ], "source": [ "missing_columns = list(set(missing_test_vars + missing_train_vars))\n", "print('There are %d columns with more than 90%% missing in either the training or testing data.' % len(missing_columns))" ] }, { "cell_type": "code", "execution_count": 57, "id": "aad1462d", "metadata": {}, "outputs": [], "source": [ "# Drop the missing columns\n", "train = train.drop(columns = missing_columns)\n", "test = test.drop(columns = missing_columns)" ] }, { "cell_type": "markdown", "id": "4bd035b0", "metadata": {}, "source": [ "결측치가 90%를 초과하는 열이 없기 때문에 이 라운드에서는 열을 제거하지 못했습니다. 치수 축소를 위해 다른 피쳐 선택 방법을 적용해야 할 수도 있습니다.\n", "그러면 교육 데이터와 테스트 데이터를 모두 저장할 것입니다. 누락된 열을 삭제하기 위해 다른 백분율을 시도하고 결과를 비교하는 것이 좋습니다.\n" ] }, { "cell_type": "code", "execution_count": 58, "id": "885df99d", "metadata": {}, "outputs": [], "source": [ "train.to_csv('train_bureau_raw.csv', index = False)\n", "test.to_csv('test_bureau_raw.csv', index = False)" ] }, { "cell_type": "markdown", "id": "8c770a84", "metadata": {}, "source": [ "## 3-2) Correlations\n", "\n", "\n", "먼저 변수와 대상의 상관관계를 살펴보겠습니다. 우리가 생성한 모든 변수에서 교육 데이터에 이미 존재하는 변수보다 더 큰 상관 관계를 가질 수 있습니다(응용프로그램에서).\n" ] }, { "cell_type": "code", "execution_count": 59, "id": "63307154", "metadata": {}, "outputs": [], "source": [ "# Calculate all correlations in dataframe\n", "corrs = train.corr()" ] }, { "cell_type": "code", "execution_count": 60, "id": "5d53b46e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TARGET
TARGET1.000000
bureau_DAYS_CREDIT_mean0.089729
client_bureau_balance_MONTHS_BALANCE_min_mean0.089038
DAYS_BIRTH0.078239
bureau_CREDIT_ACTIVE_Active_count_norm0.077356
client_bureau_balance_MONTHS_BALANCE_mean_mean0.076424
bureau_DAYS_CREDIT_min0.075248
client_bureau_balance_MONTHS_BALANCE_min_min0.073225
client_bureau_balance_MONTHS_BALANCE_sum_mean0.072606
bureau_DAYS_CREDIT_UPDATE_mean0.068927
\n", "
" ], "text/plain": [ " TARGET\n", "TARGET 1.000000\n", "bureau_DAYS_CREDIT_mean 0.089729\n", "client_bureau_balance_MONTHS_BALANCE_min_mean 0.089038\n", "DAYS_BIRTH 0.078239\n", "bureau_CREDIT_ACTIVE_Active_count_norm 0.077356\n", "client_bureau_balance_MONTHS_BALANCE_mean_mean 0.076424\n", "bureau_DAYS_CREDIT_min 0.075248\n", "client_bureau_balance_MONTHS_BALANCE_min_min 0.073225\n", "client_bureau_balance_MONTHS_BALANCE_sum_mean 0.072606\n", "bureau_DAYS_CREDIT_UPDATE_mean 0.068927" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrs = corrs.sort_values('TARGET', ascending = False)\n", "\n", "# Ten most positive correlations\n", "pd.DataFrame(corrs['TARGET'].head(10))" ] }, { "cell_type": "code", "execution_count": 61, "id": "d4e052e4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TARGET
client_bureau_balance_MONTHS_BALANCE_count_min-0.048224
client_bureau_balance_STATUS_C_count_norm_mean-0.055936
client_bureau_balance_STATUS_C_count_max-0.061083
client_bureau_balance_STATUS_C_count_mean-0.062954
client_bureau_balance_MONTHS_BALANCE_count_max-0.068792
bureau_CREDIT_ACTIVE_Closed_count_norm-0.079369
client_bureau_balance_MONTHS_BALANCE_count_mean-0.080193
EXT_SOURCE_1-0.155317
EXT_SOURCE_2-0.160472
EXT_SOURCE_3-0.178919
\n", "
" ], "text/plain": [ " TARGET\n", "client_bureau_balance_MONTHS_BALANCE_count_min -0.048224\n", "client_bureau_balance_STATUS_C_count_norm_mean -0.055936\n", "client_bureau_balance_STATUS_C_count_max -0.061083\n", "client_bureau_balance_STATUS_C_count_mean -0.062954\n", "client_bureau_balance_MONTHS_BALANCE_count_max -0.068792\n", "bureau_CREDIT_ACTIVE_Closed_count_norm -0.079369\n", "client_bureau_balance_MONTHS_BALANCE_count_mean -0.080193\n", "EXT_SOURCE_1 -0.155317\n", "EXT_SOURCE_2 -0.160472\n", "EXT_SOURCE_3 -0.178919" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ten most negative correlations\n", "pd.DataFrame(corrs['TARGET'].dropna().tail(10))" ] }, { "cell_type": "markdown", "id": "0b577132", "metadata": {}, "source": [ "대상과의 상관관계가 가장 높은 변수(상관성이 1인 대상 제외)는 우리가 생성한 변수입니다. 그러나 변수가 상관관계가 있다고 해서 유용하다는 것은 아니며, 수백 개의 새로운 변수를 생성하면 일부는 단순히 무작위 노이즈 때문에 대상과 상관 관계가 있다는 것을 기억해야 합니다.\n", "\n", "회의적으로 상관 관계를 살펴보면 새로 생성된 변수 중 몇 개가 유용할 수 있습니다. 변수의 \"유용성\"을 평가하기 위해 모형에서 반환되는 피쳐 중요도를 살펴보겠습니다. 호기심을 위해(그리고 함수를 이미 작성했기 때문에) 새로 생성된 두 변수에 대한 kde 그림을 만들 수 있습니다." ] }, { "cell_type": "code", "execution_count": 62, "id": "ce832ca6", "metadata": {}, "outputs": [], "source": [ "# kde_target(var_name='client_bureau_balance_counts_mean', df=train) error" ] }, { "cell_type": "code", "execution_count": 63, "id": "3654468b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The correlation between bureau_CREDIT_ACTIVE_Active_count_norm and the TARGET is 0.0774\n", "Median value for loan that was not repaid = 0.5000\n", "Median value for loan that was repaid = 0.3636\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "kde_target(var_name='bureau_CREDIT_ACTIVE_Active_count_norm', df=train)" ] }, { "cell_type": "markdown", "id": "bb49eb4a", "metadata": {}, "source": [ "음, 이 배포판은 어디에나 있습니다. 이 변수는 CREMIT_ACTIVE 값이 활성인 이전 대출 수를 고객에 대한 총 이전 대출 수로 나눈 값을 나타냅니다. 여기 상관관계가 너무 약해서 결론을 내려서는 안 될 것 같아요!" ] }, { "cell_type": "markdown", "id": "8bb2d182", "metadata": {}, "source": [ "### 3-2-1) Collinear Variables\n", "변수와 목표 변수의 상관관계뿐만 아니라 각 변수와 다른 변수의 상관관계도 계산할 수 있습니다. 이를 통해 데이터에서 제거해야 할 높은 공선형 변수가 있는지 확인할 수 있습니다.\n", "다른 변수와의 상관 관계가 0.8보다 큰 변수를 살펴보겠습니다.\n" ] }, { "cell_type": "code", "execution_count": 64, "id": "9362e5c6", "metadata": {}, "outputs": [], "source": [ "# Set the threshold\n", "threshold = 0.8\n", "\n", "# Empty dictionary to hold correlated variables\n", "above_threshold_vars = {}\n", "\n", "# For each column, record the variables that are above the threshold\n", "for col in corrs:\n", " above_threshold_vars[col] = list(corrs.index[corrs[col] > threshold])" ] }, { "cell_type": "markdown", "id": "09c94c40", "metadata": {}, "source": [ "이러한 상관 관계가 높은 변수 쌍에 대해 두 변수 중 하나만 제거하려고 합니다. 다음 코드는 각 쌍 중 하나만 추가하여 제거할 변수 집합을 만듭니다.\n" ] }, { "cell_type": "code", "execution_count": 65, "id": "356f3c7c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of columns to remove: 134\n" ] } ], "source": [ "# Track columns to remove and columns already examined\n", "cols_to_remove = []\n", "cols_seen = []\n", "cols_to_remove_pair = []\n", "\n", "# Iterate through columns and correlated columns\n", "for key, value in above_threshold_vars.items():\n", " # Keep track of columns already examined\n", " cols_seen.append(key)\n", " for x in value:\n", " if x == key:\n", " next\n", " else:\n", " # Only want to remove one in a pair\n", " if x not in cols_seen:\n", " cols_to_remove.append(x)\n", " cols_to_remove_pair.append(key)\n", " \n", "cols_to_remove = list(set(cols_to_remove))\n", "print('Number of columns to remove: ', len(cols_to_remove))" ] }, { "cell_type": "markdown", "id": "df46c271", "metadata": {}, "source": [ "교육 및 테스트 데이터 세트 모두에서 이러한 열을 제거할 수 있습니다. 이러한 변수를 제거한 후 이러한 변수를 유지하는 성능(앞서 저장한 원시 csv 파일)을 비교해야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 66, "id": "719a7ed5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Corrs Removed Shape: (307511, 199)\n", "Testing Corrs Removed Shape: (48744, 198)\n" ] } ], "source": [ "train_corrs_removed = train.drop(columns = cols_to_remove)\n", "test_corrs_removed = test.drop(columns = cols_to_remove)\n", "\n", "print('Training Corrs Removed Shape: ', train_corrs_removed.shape)\n", "print('Testing Corrs Removed Shape: ', test_corrs_removed.shape)" ] }, { "cell_type": "code", "execution_count": 67, "id": "40f805ad", "metadata": {}, "outputs": [], "source": [ "train_corrs_removed.to_csv('train_bureau_corrs_removed.csv', index = False)\n", "test_corrs_removed.to_csv('test_bureau_corrs_removed.csv', index = False)" ] }, { "cell_type": "markdown", "id": "5471acf1", "metadata": {}, "source": [ "# 4. Modeling\n" ] }, { "cell_type": "markdown", "id": "4a805a03", "metadata": {}, "source": [ "이러한 새로운 데이터 세트의 성능을 실제로 테스트하기 위해 이 데이터셋을 머신러닝에 사용해 보겠습니다. 여기서는 다른 노트북에서 개발한 기능을 사용하여 기능(상관성이 높은 변수가 제거된 원시 버전)을 비교합니다. 우리는 이런 종류의 실험을 할 수 있고, 경쟁사에 제출되었을 때 이 기능에 있는 애플리케이션 데이터들의 성능만 제어할 수 있을 것입니다. 이 성능을 이미 기록했으므로 제어 및 두 가지 테스트 조건을 나열할 수 있습니다.\n", "\n", "모든 데이터셋에 대해 정확한 하이퍼 파라미터와 함께 아래 표시된 모델을 사용하십시오.\n", "\n", "컨트롤: 응용 프로그램 파일의 데이터만 해당됩니다.\n", "테스트 1: 응용 프로그램 파일의 데이터와 bureau_balance 파일의 모든 데이터가 기록됩니다.\n", "테스트 2: 애플리케이션 파일의 데이터와 상관 관계가 높은 변수가 있는 bure_balance 파일을 모두 제거합니다.\n", " " ] }, { "cell_type": "code", "execution_count": 74, "id": "23e8e231", "metadata": {}, "outputs": [], "source": [ "import lightgbm as lgb\n", "\n", "from sklearn.model_selection import KFold\n", "from sklearn.metrics import roc_auc_score\n", "from sklearn.preprocessing import LabelEncoder\n", "\n", "import gc\n", "\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 87, "id": "b0c67879", "metadata": {}, "outputs": [], "source": [ "def model(features, test_features, encoding = 'ohe', n_folds = 5):\n", " \n", " \"\"\"Train and test a light gradient boosting model using\n", " cross validation. \n", " \n", " Parameters\n", " --------\n", " features (pd.DataFrame): \n", " dataframe of training features to use \n", " for training a model. Must include the TARGET column.\n", " test_features (pd.DataFrame): \n", " dataframe of testing features to use\n", " for making predictions with the model. \n", " encoding (str, default = 'ohe'): \n", " method for encoding categorical variables. Either 'ohe' for one-hot encoding or 'le' for integer label encoding\n", " n_folds (int, default = 5): number of folds to use for cross validation\n", " \n", " Return\n", " --------\n", " submission (pd.DataFrame): \n", " dataframe with `SK_ID_CURR` and `TARGET` probabilities\n", " predicted by the model.\n", " feature_importances (pd.DataFrame): \n", " dataframe with the feature importances from the model.\n", " valid_metrics (pd.DataFrame): \n", " dataframe with training and validation metrics (ROC AUC) for each fold and overall.\n", " \n", " \"\"\"\n", " \n", " # Extract the ids\n", " train_ids = features['SK_ID_CURR']\n", " test_ids = test_features['SK_ID_CURR']\n", " \n", " # Extract the labels for training\n", " labels = features['TARGET']\n", " \n", " # Remove the ids and target\n", " features = features.drop(columns = ['SK_ID_CURR', 'TARGET'])\n", " test_features = test_features.drop(columns = ['SK_ID_CURR'])\n", " \n", " \n", " # One Hot Encoding\n", " if encoding == 'ohe':\n", " features = pd.get_dummies(features)\n", " test_features = pd.get_dummies(test_features)\n", " \n", " # Align the dataframes by the columns\n", " features, test_features = features.align(test_features, join = 'inner', axis = 1)\n", " \n", " # No categorical indices to record\n", " cat_indices = 'auto'\n", " \n", " # Integer label encoding\n", " elif encoding == 'le':\n", " \n", " # Create a label encoder\n", " label_encoder = LabelEncoder()\n", " \n", " # List for storing categorical indices\n", " cat_indices = []\n", " \n", " # Iterate through each column\n", " for i, col in enumerate(features):\n", " if features[col].dtype == 'object':\n", " # Map the categorical features to integers\n", " features[col] = label_encoder.fit_transform(np.array(features[col].astype(str)).reshape((-1,)))\n", " test_features[col] = label_encoder.transform(np.array(test_features[col].astype(str)).reshape((-1,)))\n", "\n", " # Record the categorical indices\n", " cat_indices.append(i)\n", " \n", " # Catch error if label encoding scheme is not valid\n", " else:\n", " raise ValueError(\"Encoding must be either 'ohe' or 'le'\")\n", " \n", " print('Training Data Shape: ', features.shape)\n", " print('Testing Data Shape: ', test_features.shape)\n", " \n", " # Extract feature names\n", " feature_names = list(features.columns)\n", " \n", " # Convert to np arrays\n", " features = np.array(features)\n", " test_features = np.array(test_features)\n", " \n", " # Create the kfold object\n", " k_fold = KFold(n_splits = n_folds, shuffle = True, random_state = 50)\n", " \n", " # Empty array for feature importances\n", " feature_importance_values = np.zeros(len(feature_names))\n", " \n", " # Empty array for test predictions\n", " test_predictions = np.zeros(test_features.shape[0])\n", " \n", " # Empty array for out of fold validation predictions\n", " out_of_fold = np.zeros(features.shape[0])\n", " \n", " # Lists for recording validation and training scores\n", " valid_scores = []\n", " train_scores = []\n", " \n", " # Iterate through each fold\n", " for train_indices, valid_indices in k_fold.split(features):\n", " \n", " # Training data for the fold\n", " train_features, train_labels = features[train_indices], labels[train_indices]\n", " # Validation data for the fold\n", " valid_features, valid_labels = features[valid_indices], labels[valid_indices]\n", " \n", " # Create the model\n", " model = lgb.LGBMClassifier(n_estimators=10000, objective = 'binary', \n", " class_weight = 'balanced', learning_rate = 0.05, \n", " reg_alpha = 0.1, reg_lambda = 0.1, \n", " subsample = 0.8, n_jobs = -1, random_state = 50)\n", " \n", " # Train the model\n", " model.fit(train_features, train_labels, eval_metric = 'auc',\n", " eval_set = [(valid_features, valid_labels), (train_features, train_labels)],\n", " eval_names = ['valid', 'train'], categorical_feature = cat_indices,\n", " early_stopping_rounds = 100, verbose = 200)\n", " \n", " # Record the best iteration\n", " best_iteration = model.best_iteration_\n", " \n", " # Record the feature importances\n", " feature_importance_values += model.feature_importances_ / k_fold.n_splits\n", " \n", " # Make predictions\n", " test_predictions += model.predict_proba(test_features, num_iteration = best_iteration)[:, 1] / k_fold.n_splits\n", " \n", " # Record the out of fold predictions\n", " out_of_fold[valid_indices] = model.predict_proba(valid_features, num_iteration = best_iteration)[:, 1]\n", " \n", " # Record the best score\n", " valid_score = model.best_score_['valid']['auc']\n", " train_score = model.best_score_['train']['auc']\n", " \n", " valid_scores.append(valid_score)\n", " train_scores.append(train_score)\n", " \n", " # Clean up memory\n", " gc.enable()\n", " del model, train_features, valid_features\n", " gc.collect()\n", " \n", " # Make the submission dataframe\n", " submission = pd.DataFrame({'SK_ID_CURR': test_ids, 'TARGET': test_predictions})\n", " \n", " # Make the feature importance dataframe\n", " feature_importances = pd.DataFrame({'feature': feature_names, 'importance': feature_importance_values})\n", " \n", " # Overall validation score\n", " valid_auc = roc_auc_score(labels, out_of_fold)\n", " \n", " # Add the overall scores to the metrics\n", " valid_scores.append(valid_auc)\n", " train_scores.append(np.mean(train_scores))\n", " \n", " # Needed for creating dataframe of validation scores\n", " fold_names = list(range(n_folds))\n", " fold_names.append('overall')\n", " \n", " # Dataframe of validation scores\n", " metrics = pd.DataFrame({'fold': fold_names,\n", " 'train': train_scores,\n", " 'valid': valid_scores}) \n", " \n", " return submission, feature_importances, metrics" ] }, { "cell_type": "code", "execution_count": 80, "id": "16858a9a", "metadata": {}, "outputs": [], "source": [ "def plot_feature_importances(df):\n", " \"\"\"\n", " Plot importances returned by a model. This can work with any measure of\n", " feature importance provided that higher importance is better. \n", " \n", " Args:\n", " df (dataframe): feature importances. Must have the features in a column\n", " called `features` and the importances in a column called `importance\n", " \n", " Returns:\n", " shows a plot of the 15 most importance features\n", " \n", " df (dataframe): feature importances sorted by importance (highest to lowest) \n", " with a column for normalized importance\n", " \"\"\"\n", " \n", " # Sort features according to importance\n", " df = df.sort_values('importance', ascending = False).reset_index()\n", " \n", " # Normalize the feature importances to add up to one\n", " df['importance_normalized'] = df['importance'] / df['importance'].sum()\n", "\n", " # Make a horizontal bar chart of feature importances\n", " plt.figure(figsize = (10, 6))\n", " ax = plt.subplot()\n", " \n", " # Need to reverse the index to plot most important on top\n", " ax.barh(list(reversed(list(df.index[:15]))), \n", " df['importance_normalized'].head(15), \n", " align = 'center', edgecolor = 'k')\n", " \n", " # Set the yticks and labels\n", " ax.set_yticks(list(reversed(list(df.index[:15]))))\n", " ax.set_yticklabels(df['feature'].head(15))\n", " \n", " # Plot labeling\n", " plt.xlabel('Normalized Importance'); plt.title('Feature Importances')\n", " plt.show()\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": null, "id": "5596ead6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1f5f6ad2", "metadata": {}, "source": [ "Control\n", "실험의 첫 번째 단계는 관리 상태를 설정하는 것입니다. 이를 위해 위에서 정의한 함수(그라디언트 부스팅 머신 모델 구현)와 단일 메인 데이터 소스(애플리케이션)를 사용합니다.\n" ] }, { "cell_type": "code", "execution_count": 81, "id": "22aafc24", "metadata": {}, "outputs": [], "source": [ "train_control = pd.read_csv('./input/application_train.csv')\n", "test_control = pd.read_csv('./input/application_test.csv')" ] }, { "cell_type": "code", "execution_count": null, "id": "68b15d56", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "aff239f7", "metadata": {}, "source": [ "다행히도 시간을 들여 함수를 작성하면 사용이 간단합니다(이 노트에 중심 테마가 있는 경우 함수를 사용하여 작업을 단순화하고 재현할 수 있습니다!). 위의 함수는 경쟁업체에 업로드할 수 있는 제출 데이터 프레임, 기능 중요도의 fi 데이터 프레임, 검증 및 테스트 성능이 있는 메트릭 데이터 프레임을 반환합니다.\n" ] }, { "cell_type": "code", "execution_count": 88, "id": "45d05b54", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Data Shape: (307511, 241)\n", "Testing Data Shape: (48744, 241)\n", "[200]\ttrain's auc: 0.7989\ttrain's binary_logloss: 0.547642\tvalid's auc: 0.755463\tvalid's binary_logloss: 0.563361\n", "[400]\ttrain's auc: 0.82864\ttrain's binary_logloss: 0.518235\tvalid's auc: 0.755594\tvalid's binary_logloss: 0.544951\n", "[200]\ttrain's auc: 0.798638\ttrain's binary_logloss: 0.547974\tvalid's auc: 0.758354\tvalid's binary_logloss: 0.56326\n", "[200]\ttrain's auc: 0.7977\ttrain's binary_logloss: 0.549358\tvalid's auc: 0.763287\tvalid's binary_logloss: 0.564505\n", "[200]\ttrain's auc: 0.798947\ttrain's binary_logloss: 0.547854\tvalid's auc: 0.757823\tvalid's binary_logloss: 0.562315\n", "[200]\ttrain's auc: 0.798357\ttrain's binary_logloss: 0.548311\tvalid's auc: 0.758237\tvalid's binary_logloss: 0.564466\n" ] } ], "source": [ "submission, fi, metrics = model(train_control, test_control)" ] }, { "cell_type": "code", "execution_count": 89, "id": "16723891", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
foldtrainvalid
000.8157910.755755
110.8119120.758533
220.8112520.763822
330.8058990.758345
440.8074590.758535
5overall0.8104630.759002
\n", "
" ], "text/plain": [ " fold train valid\n", "0 0 0.815791 0.755755\n", "1 1 0.811912 0.758533\n", "2 2 0.811252 0.763822\n", "3 3 0.805899 0.758345\n", "4 4 0.807459 0.758535\n", "5 overall 0.810463 0.759002" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics" ] }, { "cell_type": "markdown", "id": "1bbd5d04", "metadata": {}, "source": [ "교육 점수가 검증 점수보다 높기 때문에 관리 기능이 약간 과적합됩니다. 이 문제는 정규화를 살펴볼 때 이후의 노트북에서 해결할 수 있습니다(이 모델에서는 이미 reg_lambda 및 reg_alpha와 조기 중지를 사용하여 일부 정규화를 수행하고 있습니다).\n", "\n", "다른 함수인 플롯_feature_imports를 사용하여 피쳐의 중요성을 시각화할 수 있습니다. 피쳐 중요도는 피쳐를 선택할 때 유용할 수 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 91, "id": "9012a801", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fi_sorted = plot_feature_importances(fi)" ] }, { "cell_type": "code", "execution_count": 92, "id": "836f0859", "metadata": {}, "outputs": [], "source": [ "submission.to_csv('control.csv', index = False)" ] }, { "cell_type": "markdown", "id": "d71ee5d8", "metadata": {}, "source": [ "The control scores 0.745 when submitted to the competition.\n", "\n" ] }, { "cell_type": "markdown", "id": "ec6f8a8f", "metadata": {}, "source": [ "Test One\n", "\n", "첫 번째 테스트를 진행하겠습니다. 우리는 대부분의 일을 하는 함수에 데이터를 전달하기만 하면 됩니다.\n" ] }, { "cell_type": "code", "execution_count": 93, "id": "ea1af014", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Data Shape: (307511, 241)\n", "Testing Data Shape: (48744, 241)\n", "[200]\ttrain's auc: 0.7989\ttrain's binary_logloss: 0.547642\tvalid's auc: 0.755463\tvalid's binary_logloss: 0.563361\n", "[400]\ttrain's auc: 0.82864\ttrain's binary_logloss: 0.518235\tvalid's auc: 0.755594\tvalid's binary_logloss: 0.544951\n", "[200]\ttrain's auc: 0.798638\ttrain's binary_logloss: 0.547974\tvalid's auc: 0.758354\tvalid's binary_logloss: 0.56326\n", "[200]\ttrain's auc: 0.7977\ttrain's binary_logloss: 0.549358\tvalid's auc: 0.763287\tvalid's binary_logloss: 0.564505\n", "[200]\ttrain's auc: 0.798947\ttrain's binary_logloss: 0.547854\tvalid's auc: 0.757823\tvalid's binary_logloss: 0.562315\n", "[200]\ttrain's auc: 0.798357\ttrain's binary_logloss: 0.548311\tvalid's auc: 0.758237\tvalid's binary_logloss: 0.564466\n" ] } ], "source": [ "submission_raw, fi_raw, metrics_raw = model(train, test)" ] }, { "cell_type": "code", "execution_count": 94, "id": "edc752d8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
foldtrainvalid
000.8157910.755755
110.8119120.758533
220.8112520.763822
330.8058990.758345
440.8074590.758535
5overall0.8104630.759002
\n", "
" ], "text/plain": [ " fold train valid\n", "0 0 0.815791 0.755755\n", "1 1 0.811912 0.758533\n", "2 2 0.811252 0.763822\n", "3 3 0.805899 0.758345\n", "4 4 0.807459 0.758535\n", "5 overall 0.810463 0.759002" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics_raw" ] }, { "cell_type": "markdown", "id": "358fb763", "metadata": {}, "source": [ "이 숫자에 기초하여, 조작된 형상은 대조군보다 더 나은 성능을 발휘합니다. 그러나 이 더 나은 검증 성능이 테스트 데이터로 전송되는지 여부는 예측 결과를 리더보드에 제출해야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 95, "id": "42f9185b", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fi_raw_sorted = plot_feature_importances(fi_raw)" ] }, { "cell_type": "markdown", "id": "9ccb7a56", "metadata": {}, "source": [ "특징의 성향을 살펴본 결과, 우리가 구축한 몇 가지 특징들이 가장 중요한 것들 중 하나인 것 같습니다. 이 노트에서 우리가 만든 상위 100가지 가장 중요한 기능 중 몇 퍼센트를 찾아보겠습니다. 그러나 기존 기능과 비교하기보다는 원핫 인코딩된 기존 기능과 비교해야 합니다. 이것들은 이미 fi에 기록되어 있습니다.\n" ] }, { "cell_type": "code", "execution_count": 96, "id": "acdb37b4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "% of Top 100 Features created from the bureau data = 0.00\n" ] } ], "source": [ "top_100 = list(fi_raw_sorted['feature'])[:100]\n", "new_features = [x for x in top_100 if x not in list(fi['feature'])]\n", "\n", "print('%% of Top 100 Features created from the bureau data = %d.00' % len(new_features))" ] }, { "cell_type": "markdown", "id": "d72730b2", "metadata": {}, "source": [ "100대 특집 중 절반 이상이 저희가 만든 작품이에요! 그것은 우리가 노력한 모든 것이 보람 있었다는 자신감을 주어야 합니다.\n" ] }, { "cell_type": "code", "execution_count": 97, "id": "4c360c85", "metadata": {}, "outputs": [], "source": [ "submission_raw.to_csv('test_one.csv', index = False)" ] }, { "cell_type": "markdown", "id": "8b6dc93b", "metadata": {}, "source": [ "Test one scores 0.759 when submitted to the competition.\n", "\n" ] }, { "cell_type": "markdown", "id": "0d42c8bc", "metadata": {}, "source": [ "Test Two\n", "\n", "쉬웠으니까 한 번 더 달려보죠! 이전과 동일하지만 높은 공선형 변수를 제거합니다.\n", "\n" ] }, { "cell_type": "code", "execution_count": 98, "id": "3d6ca591", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Data Shape: (307511, 318)\n", "Testing Data Shape: (48744, 318)\n", "[200]\ttrain's auc: 0.807062\ttrain's binary_logloss: 0.539926\tvalid's auc: 0.761756\tvalid's binary_logloss: 0.555974\n", "[200]\ttrain's auc: 0.807403\ttrain's binary_logloss: 0.539835\tvalid's auc: 0.762487\tvalid's binary_logloss: 0.556178\n", "[200]\ttrain's auc: 0.806206\ttrain's binary_logloss: 0.541195\tvalid's auc: 0.766958\tvalid's binary_logloss: 0.557491\n", "[400]\ttrain's auc: 0.83864\ttrain's binary_logloss: 0.508571\tvalid's auc: 0.767385\tvalid's binary_logloss: 0.536782\n", "[200]\ttrain's auc: 0.806561\ttrain's binary_logloss: 0.540767\tvalid's auc: 0.763307\tvalid's binary_logloss: 0.556131\n", "[200]\ttrain's auc: 0.807104\ttrain's binary_logloss: 0.540208\tvalid's auc: 0.760759\tvalid's binary_logloss: 0.557689\n" ] } ], "source": [ "submission_corrs, fi_corrs, metrics_corr = model(train_corrs_removed, test_corrs_removed)" ] }, { "cell_type": "code", "execution_count": 99, "id": "69ec06e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
foldtrainvalid
000.8198900.762222
110.8162960.762811
220.8358010.767546
330.8078120.763487
440.8236350.761030
5overall0.8206870.763372
\n", "
" ], "text/plain": [ " fold train valid\n", "0 0 0.819890 0.762222\n", "1 1 0.816296 0.762811\n", "2 2 0.835801 0.767546\n", "3 3 0.807812 0.763487\n", "4 4 0.823635 0.761030\n", "5 overall 0.820687 0.763372" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics_corr" ] }, { "cell_type": "markdown", "id": "5ddb0fe7", "metadata": {}, "source": [ "이러한 결과는 관리 수준보다는 낫지만 원시 특징보다는 약간 낮습니다." ] }, { "cell_type": "code", "execution_count": 100, "id": "efd3fccc", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fi_corrs_sorted = plot_feature_importances(fi_corrs)" ] }, { "cell_type": "code", "execution_count": 101, "id": "cb5642af", "metadata": {}, "outputs": [], "source": [ "submission_corrs.to_csv('test_two.csv', index = False)" ] }, { "cell_type": "markdown", "id": "68e2751a", "metadata": {}, "source": [ "Test Two scores 0.753 when submitted to the competition.\n" ] }, { "cell_type": "markdown", "id": "c5a6b135", "metadata": {}, "source": [ "Results\n", "\n", "그렇게 노력한 결과, 추가 정보를 포함하면 실적이 향상되었다고 할 수 있습니다! 이 모델은 데이터에 최적화되어 있지 않지만 계산된 기능을 사용할 때 원래 데이터 세트에 비해 눈에 띄게 개선되었습니다. 공식적으로 공연 내용을 요약해 보겠습니다.\n", "\n", "우리의 모든 노력은 원본 테스트 데이터에 비해 0.014 ROC AUC가 약간 개선된 것으로 해석됩니다. 높은 공선형 변수를 제거하면 성능이 약간 저하되므로 다른 피쳐 선택 방법을 고려하고자 합니다. 또한 우리가 만든 기능 중 일부는 모델에 따라 가장 중요한 기능 중 하나라고 할 수 있습니다.\n", "\n", "이런 대회에서는 이 정도 크기만 개선해도 리더보드의 100분의 1을 올릴 수 있습니다. 이 노트와 같이 수많은 작은 개선을 통해 우리는 점점 더 좋은 성과를 낼 수 있습니다. 저는 다른 사람들이 여기의 결과를 활용하여 스스로 개선하도록 격려하며, 제가 다른 사람들을 돕기 위해 취하는 조치들을 계속 기록할 것입니다.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f9f156e9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7d5f7250", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8625d022", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d1175c78", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0a069eb3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0f70d698", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a314cced", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8cb3b527", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "804ee574", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6a8b0509", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f91f5ef5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "edc7c765", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "80a258a5", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }