{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 분할 군집분석(K-평균 군집분석)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" NO | \n",
" Unnamed: 0 | \n",
" Menu | \n",
" subMenu | \n",
" news_from | \n",
" _press_title | \n",
" _press_content | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 정치 | \n",
" 청와대 | \n",
" 파이낸셜뉴스 | \n",
" 文대통령 \"'오월 정신'은 모두의 것...국가폭력 진상 밝혀내야\" | \n",
" -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 정치 | \n",
" 국회/정당 | \n",
" 뉴시스 | \n",
" 통합당 \"5·18 발언 사과…희생 헛되지 않게 발벗고 나서야\" | \n",
" \"짐작할 수 없는 슬픔 속에 사는 유가족에 위로\"\"해야 할 일 분명…주호영 광주 방... | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 2 | \n",
" 정치 | \n",
" 북한 | \n",
" 한겨레 | \n",
" 보훈처, ‘6·25 참전’ 나바호족에 마스크 1만장 지원 | \n",
" 6·25 전쟁에 참전했던 미국의 원주민 나바호족 용사들에게 마스크 1만장과 손소독제... | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 3 | \n",
" 정치 | \n",
" 행정 | \n",
" 연합뉴스 | \n",
" 조길형 충주시장 \"수안보연수원 매입 절차 누락 내 책임\" | \n",
" 시의회 강도 높은 질책에 \"모든 조사 겸허히 받겠다\" 사과 (충주=연합뉴스) 박... | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 4 | \n",
" 정치 | \n",
" 국방/외교 | \n",
" 더팩트 | \n",
" 북한 선전매체 \"5·18 대학살자들 청산해야\" | \n",
" 북한이 5·18 민주화운동 40주년인 18일을 맞아 철저한 진상규명과 책임자들에 대... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" NO Unnamed: 0 Menu subMenu news_from \\\n",
"0 1 0 정치 청와대 파이낸셜뉴스 \n",
"1 2 1 정치 국회/정당 뉴시스 \n",
"2 3 2 정치 북한 한겨레 \n",
"3 4 3 정치 행정 연합뉴스 \n",
"4 5 4 정치 국방/외교 더팩트 \n",
"\n",
" _press_title \\\n",
"0 文대통령 \"'오월 정신'은 모두의 것...국가폭력 진상 밝혀내야\" \n",
"1 통합당 \"5·18 발언 사과…희생 헛되지 않게 발벗고 나서야\" \n",
"2 보훈처, ‘6·25 참전’ 나바호족에 마스크 1만장 지원 \n",
"3 조길형 충주시장 \"수안보연수원 매입 절차 누락 내 책임\" \n",
"4 북한 선전매체 \"5·18 대학살자들 청산해야\" \n",
"\n",
" _press_content \n",
"0 -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... \n",
"1 \"짐작할 수 없는 슬픔 속에 사는 유가족에 위로\"\"해야 할 일 분명…주호영 광주 방... \n",
"2 6·25 전쟁에 참전했던 미국의 원주민 나바호족 용사들에게 마스크 1만장과 손소독제... \n",
"3 시의회 강도 높은 질책에 \"모든 조사 겸허히 받겠다\" 사과 (충주=연합뉴스) 박... \n",
"4 북한이 5·18 민주화운동 40주년인 18일을 맞아 철저한 진상규명과 책임자들에 대... "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from konlpy.tag import Hannanum\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"from sklearn.cluster import KMeans\n",
"import numpy as np\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import pyplot as plt\n",
"import scipy.cluster.hierarchy as shc\n",
"\n",
"hannanum = Hannanum()\n",
"\n",
"\n",
"\n",
"from konlpy.tag import Okt\n",
"okt = Okt()\n",
"\n",
"\n",
"# 불용어 사전 읽기\n",
"# Txt 파일의 형태는 ANSI, EUC-KR로 인코딩 되어 한다.\n",
"with open(\"C:\\\\Users\\\\user\\\\Documents\\\\PythonTest\\\\Dic\\\\StopWordKorean.txt\", 'r') as r_file:\n",
" #파일을 연다. 문장 단위로 끊어 읽는다. \n",
" kr_stop = r_file.read().splitlines()\n",
" \n",
"# punctuation는 [, ], ? 등 기호 리스트 이다.\n",
"from string import punctuation\n",
"stop_words = [set(kr_stop + list(punctuation))]\n",
"\n",
"#분류 대상 파일을 읽어온다\n",
"Data = pd.read_csv('C:\\\\Users\\\\user\\\\Documents\\\\PythonTest\\\\Data\\\\(18-25)정치경제.csv',engine=\"python\")\n",
"\n",
"Data.head()\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hannanum_docs= 84 okt_docs= 84\n"
]
}
],
"source": [
"hannanum_docs = []\n",
"okt_docs = []\n",
"\n",
"\n",
"\n",
"\n",
"#문장에서 명사만 추출하여 리스트로 생성 한다.\n",
"for i in Data['_press_content']:\n",
" hannanum_docs.append(hannanum.nouns(i))\n",
"\n",
"#추출된 명사 리스트를 문장을 되돌린다\n",
"for i in range(len(hannanum_docs)):\n",
" hannanum_docs[i] = ' '.join(hannanum_docs[i]) \n",
"\n",
"\n",
"\n",
"#문장에서 명사만 추출하여 리스트로 생성 한다.\n",
"for i in Data['_press_content']:\n",
" okt_docs.append(okt.nouns(i))\n",
" \n",
"#추출된 명사 리스트를 문장을 되돌린다\n",
"for i in range(len(okt_docs)):\n",
" okt_docs[i] = ' '.join(okt_docs[i]) \n",
"\n",
"\n",
"print(\"hannanum_docs=\", len(hannanum_docs), \"okt_docs=\", len(okt_docs),) \n"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"#추출한 명사 리스트를 벡터처리 한다.\n",
"vec_hannanum = CountVectorizer()\n",
"X_hannanum = vec_hannanum.fit_transform(hannanum_docs)\n",
"\n",
"df_hannanum = pd.DataFrame(X_hannanum.toarray(), columns=vec_hannanum.get_feature_names())\n",
"\n",
"\n",
"#추출한 명사 리스트를 벡터처리 한다.\n",
"vec_okt = CountVectorizer()\n",
"X_okt = vec_okt.fit_transform(okt_docs)\n",
"\n",
"df_okt = pd.DataFrame(X_okt.toarray(), columns=vec_okt.get_feature_names())\n"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total_hannanum = 84 sum_i_1= 83 sum_i_0= 1\n",
"total_okt = 84 sum_i_1= 83 sum_i_0= 1\n"
]
}
],
"source": [
"#K-평균 군집분석\n",
"kmeans_hannanum = KMeans(n_clusters=2).fit(df_hannanum)\n",
"sum_i_1=0\n",
"sum_i_0=0\n",
"\n",
"for i in range(len(kmeans_hannanum.labels_)) :\n",
" if kmeans_hannanum.labels_[i] == 1 :\n",
" sum_i_1=sum_i_1 +1\n",
" else :\n",
" sum_i_0= sum_i_0+1\n",
" \n",
"#K-평균 군집분석\n",
"kmeans_okt = KMeans(n_clusters=2).fit(df_okt)\n",
"sum_i_1=0\n",
"sum_i_0=0\n",
"\n",
"for i in range(len(kmeans_okt.labels_)) :\n",
" if kmeans.labels_[i] == 1 :\n",
" sum_i_1=sum_i_1 +1\n",
" else :\n",
" sum_i_0= sum_i_0+1 \n",
" \n",
"print(\"total_hannanum = \",len(kmeans.labels_),\" sum_i_1=\", sum_i_1, \" sum_i_0=\",sum_i_0)\n",
"\n",
"\n",
"print(\"total_okt = \",len(kmeans_okt.labels_),\" sum_i_1=\", sum_i_1, \" sum_i_0=\",sum_i_0)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2\n",
"[0 1]\n",
"1\n"
]
}
],
"source": [
"\n",
"#클러스터 수를 확인\n",
"print(kmeans.n_clusters)\n",
"\n",
"#클러스터의 분류 값 배열을 확인\n",
"print(kmeans.labels_)\n",
"\n",
"#Number of iterations run.\n",
"print(kmeans.n_iter_)\n",
"\n",
"# print(cluster.children_)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" text | \n",
" cluster | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 지난 15일 한화손해보험 임직원들이 여의도사옥에서 대회의실에서 홀몸 어르신을 위한 ... | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" text cluster\n",
"0 -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... 0\n",
"1 지난 15일 한화손해보험 임직원들이 여의도사옥에서 대회의실에서 홀몸 어르신을 위한 ... 1"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#군집화된 결과를 배열화 한다.\n",
"resl = pd.DataFrame({'text' : Data['_press_content'], 'cluster' : kmeans.labels_})\n",
"\n",
"resl\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" text | \n",
" cluster | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 지난 15일 한화손해보험 임직원들이 여의도사옥에서 대회의실에서 홀몸 어르신을 위한 ... | \n",
" 1 | \n",
"
\n",
" \n",
" 0 | \n",
" -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" text cluster\n",
"1 지난 15일 한화손해보험 임직원들이 여의도사옥에서 대회의실에서 홀몸 어르신을 위한 ... 1\n",
"0 -5·18민주화운동?기념식?참석...취임?후?3번째-국가기념일?지정?후?첫?5·18... 0"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"#클러스터 순으로 소팅한다.\n",
"resl2 = resl.sort_values(by=['cluster'],ascending=False)\n",
"resl2\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD6CAYAAABEUDf/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAVoklEQVR4nO3df4xd5X3n8fe3Y4yZACVgh7AMjk3W3fAjxsCEGlgpJCSDF8iQTYmEYROkRbLagABRIAZHCkWdiDZRykZxflgESqTI4E1hffNrOwQS0aIW1y4OwRjWhqZlAop/UBPQFIjt7/5xrvHYvmN7fM+dO/fM+yVZ9zzPOfc8zz2WP/f4Oec8NzITSVI1/V67OyBJah1DXpIqzJCXpAoz5CWpwgx5SaowQ16SKqy0kI+Iroh4KiJ+WC/PjognI2JDRDwYEVPLakuSdHCirPvkI+ImoBc4OjMvjYgVwEOZ+UBEfAv4RWZ+c3/7mD59es6aNauU/kjSZLFmzZotmTmj0bopZTQQET3AJcAAcFNEBPBR4Mr6JvcDdwD7DflZs2axevXqMrokSZNGRPzraOvKGq65G7gV2FkvHwdsy8zt9fIQcGJJbUmSDlLTIR8RlwKbMnPNyOoGmzYcF4qIRRGxOiJWb968udnuSJJGKONM/nygPyJ+BTxAMUxzN3BMROwaDuoBXm705sxclpm9mdk7Y0bDISVJ0iFqekw+M28DbgOIiAuAmzPzqoj438DlFMF/NbCy2bYkVc/vfvc7hoaGePPNN9vdlQlv2rRp9PT0cNhhhx30e0q58DqKzwMPRMSfA08B32lhW5I61NDQEEcddRSzZs2iuGdDjWQmW7duZWhoiNmzZx/0+0oN+cz8OfDz+vKLwDll7l9S9bz55psG/EGICI477jjGeu3SJ16lMtRqcN11xavGzIA/OIdynAx5qVm1GixcCEuXFq8GvSYQQ15q1uAgDA8Xy8PDRVkd7Y477uArX/nKmN+3bds2vvGNbxxSm0uWLOGkk07iyCOPPKT3j8aQl5rV1wfd3cVyd3dR1qR0KCGfmezcuZNPfOITrFq1qvQ+GfJSs/r7YflyuPba4rW/v9090hh997vfZe7cuZxxxhl85jOf2WPdBRdc8M50K1u2bGHX/Frr1q3jnHPOYd68ecydO5cNGzawePFiXnjhBebNm8ctt9wCwJe//GU+9KEPMXfuXL74xS8C8Ktf/YpTTjmFz33uc5x11lm89NJLzJ8/nxNOOKH0z9bKWyilyaO/33AfT7VaMSzW19f0cV+3bh0DAwM88cQTTJ8+nVdffZWvfe1rB3zft771LW644Qauuuoq3n77bXbs2MFdd93FM888w9q1awEYHBxkw4YNrFq1isykv7+fxx9/nJkzZ/L8889z3333HfLwzsEy5CV1ll0XuoeH4b77mv7f02OPPcbll1/O9OnTATj22GMP6n3nnnsuAwMDDA0N8alPfYo5c+bss83g4CCDg4OceeaZALzxxhts2LCBmTNn8r73vY/58+cfcr8PlsM1kjpLyRe6M3O/tyZOmTKFnTuLuRdHPpV75ZVXUqvVOOKII7jooot47LHHGu77tttuY+3ataxdu5aNGzdyzTXXAPCud72rqX4fLENeUmcp+UL3hRdeyIoVK9i6dSsAr7766h7rZ82axZo1xfyL3//+99+pf/HFFzn55JO5/vrr6e/v5+mnn+aoo47i9ddff2ebiy66iHvvvZc33ngDgF//+tds2rSpqf6OlSEvqbOUfKH7tNNOY8mSJXz4wx/mjDPO4Kabbtpj/c0338w3v/lNzjvvPLZs2fJO/YMPPsjpp5/OvHnzeO655/jsZz/Lcccdx/nnn8/pp5/OLbfcQl9fH1deeSXnnnsuH/zgB7n88sv3+BIY6dZbb6Wnp4fh4WF6enq44447mvpcu5T2y1Bl6O3tTX80RJpc1q9fzymnnNLubnSMRscrItZkZm+j7T2Tl6QKM+QlqcIMeUmqMENekirMkJekCjPkJanCDHlJ2st4TzU8PDzMJZdcwgc+8AFOO+00Fi9ePOZ9jMaQl6SSHOpUw1A8dPXcc8/x1FNP8cQTT/CTn/yklD4Z8pImvXZPNbx582Y+8pGPADB16lTOOusshoaGSvlszkIpqeOUONPwhJtqeNu2bfzgBz/ghhtuaO6D1RnykjpKyTMNT6iphrdv387ChQu5/vrrOfnkkw/9Q43gcI2kjlL2T+pOpKmGFy1axJw5c7jxxhub+1AjNB3yETEtIlZFxC8iYl1E/Fm9fnZEPBkRGyLiwYiY2nx3JU12Zf+k7kSZavgLX/gCr732GnfffXdzH2gvZZzJvwV8NDPPAOYBCyJiPvAXwF9l5hzg34FrSmhL0iRX9k/qToSphoeGhhgYGODZZ5/lrLPOYt68edxzzz3NfbC6Uqcajohu4O+BPwF+BLw3M7dHxLnAHZl50f7e71TD0uTjVMNj05aphiOiKyLWApuAR4AXgG2Zub2+yRBwYhltSZIOXikhn5k7MnMe0AOcAzT6Wm74X4aIWBQRqyNi9ebNm8vojiSprtS7azJzG/BzYD5wTETsukWzB3h5lPcsy8zezOydMWNGmd2R1CEm0i/UTWSHcpzKuLtmRkQcU18+AvgYsB74GXB5fbOrgZXNtiWpeqZNm8bWrVsN+gPITLZu3cq0adPG9L4yHoY6Abg/IroovjRWZOYPI+JZ4IGI+HPgKeA7JbQlqWJ6enoYGhrC4doDmzZtGj09PWN6T9Mhn5lPA2c2qH+RYnxekkZ12GGHMXv27HZ3o7J84lWSKsyQl6QKM+QlqcIMeUmqMENekirMkJekCjPkJanCDHlJqjBDXpIqzJCXpAoz5CWpwgx5SaowQ16SKsyQl6QKM+QlqcIMeUmqMENekirMkJekCjPkJanCDHlJqjBDXpIqzJCXpAoz5NX5ajW47rriVdIemg75iDgpIn4WEesjYl1E3FCvPzYiHomIDfXXdzffXWkvtRosXAhLlxavBr20hzLO5LcDf5qZpwDzgWsj4lRgMfBoZs4BHq2XpXINDsLwcLE8PFyUJb2j6ZDPzFcy85/ry68D64ETgcuA++ub3Q98stm2pH309UF3d7Hc3V2UJb1jSpk7i4hZwJnAk8DxmfkKFF8EEfGeMtuSAOjvh+XLizP4vr6iLOkdpYV8RBwJ/A1wY2b+NiIO9n2LgEUAM2fOLKs7mkz6+w13aRSl3F0TEYdRBPz3MvOhevVvIuKE+voTgE2N3puZyzKzNzN7Z8yYUUZ3JEl1ZdxdE8B3gPWZ+dURq2rA1fXlq4GVzbYlSRqbMoZrzgc+A/wyItbW624H7gJWRMQ1wL8Bny6hLUnSGDQd8pn598BoA/AXNrt/SdKh84lXSaowQ16SKsyQl6QKM+QlqcIMeUmqMENekirMkJekCjPkJanCDHlJqjBDXpIqzJCXpAoz5CWpwgx5SaowQ16SKsyQl6QKM+QlqcIMeUmqMENekirMkJekCjPkJanCDHlJqjBDXpIqzJCXpAorJeQj4t6I2BQRz4yoOzYiHomIDfXXd5fRliTp4JV1Jv/XwIK96hYDj2bmHODRelmSNI5KCfnMfBx4da/qy4D768v3A58soy1J0sFr5Zj88Zn5CkD99T0tbEuS1EDbL7xGxKKIWB0Rqzdv3tzu7khSpbQy5H8TEScA1F83NdooM5dlZm9m9s6YMaOF3ZGkyaeVIV8Drq4vXw2sbGFbkqQGyrqFcjnwD8B/iYihiLgGuAv4eERsAD5eL0uSxtGUMnaSmQtHWXVhGfuvtFoNBgehrw/6+9vdG0kV0/YLr5NarQYLF8LSpcVrrdbuHkmqGEO+nQYHYXi4WB4eLsqSVCJDvp36+qC7u1ju7i7KklSiUsbkdYj6+2H5csfkJbWMId9u/f2Gu6SWcbhGkirMkJekCjPkJanCDPmJqFaD667zvnlJTTPkJxofkJJUIkN+ovEBKUklMuQnGh+QklQi75OfaHxASlKJDPmJyAekJJXE4RpJqjBDXpIqzJCXpAoz5CWpwgx5SaowQ36icmoDSSUw5CcipzaQVJLq3ydfq8Gdd8KWLXDVVfCHf1g8aPT7vw+vvbb7idKJ9PBRo6kNJkK/JHWc6oR8rbZvUNdq8Ed/BNu3F+UvfQmmTNldBrjnHsiEt9+G++4rnjZtd6D29RV9GR52agNJTWl5yEfEAuB/AV3APZl5V+mN7BreGB7eHdRPPglf/eqegQ77lt96a/fyRDlrdmoDSSVpachHRBewFPg4MAT8U0TUMvPZUhv69rf3HN644gr4j/8YrVPFmXsjE+ms2akNJJWg1RdezwE2ZuaLmfk28ABwWakt1Grw6KN71o0W8FAE/O81+NgRcOONBqukSml1yJ8IvDSiPFSvK8/g4J5DLgdj58596zKLC7GSVCGtDvloULfHWElELIqI1RGxevPmzWNvoazhla6uiTNUI0klaXXIDwEnjSj3AC+P3CAzl2Vmb2b2zpgxY+wt9PfD0Uc31cl6R4qLtbvs72Gk0db5AJOkiSYzW/aH4sLui8BsYCrwC+C00bY/++yz85DcfntmEdPN/enqKvZ18cWZU6cWdd3dmStX7m5r5cqibu91o9VLUosBq3OUXG3pmXxmbgeuA/4WWA+syMx1pTc0MAC33w6nnw7nnVec2Te6uHogO3bAX/4l/PjHxX3zsO/vrI72G6z+NqukCajl0xpk5o8z8w8y8/2ZOdCyhgYG4Je/hCeeKC6gPvzw7t9K7eoqwr+ra//76Ora9z76vW+r7OuDqVOL5alTd6/zt1klTUDVnbtm5Jn1jh1w5pnw0ENw8cXw/vfvue3RR8PZZ8PnP787qA8/vNi20ROwEXu+wu4HmK69dmI8NStJVDnkG51Z9/fDj34ECxbsue1vfwtr1hTLu4J6xYpi273DeuQtm2+9teewTH8/fP3rBrykCaO6Ib+/M+uRXwAj1WoHDmqHZSR1kOpMUNbIaFMD7PoCuPPO3Wfwu+oPZp/OKyOpQ0SONo9LG/T29ubq1avHt9ElS3afwQ+07rqwJLVKRKzJzN6G6yZ9yEtSh9tfyFd3TF7l8UleqWMZ8mWocgj6U4RSRzPkm1X1EPRJXqmjGfLNqnoIesuo1NEM+WZVPQR9klfqaN5dU4ZGPyIuSeNkf3fXVPthqPHSzO+xdsIXRCf0UVJDDte0UydctO2EPkoalSHfTp1w0bYT+ihpVIZ8O3XCRdtO6KOkUTkm3y675sy5+GI4/viJO97thGxSRzPk22HJEvjSl4rlZ54pfrpwIodnMxeWJbWVwzXtsPfFSy9mSmoRQ74d9j4r9ixZUos4XNMOu+atdx57SS3mE6+S1OGcT16SJqmmQj4iPh0R6yJiZ0T07rXutojYGBHPR8RFzXVTknQomh2Tfwb4FPDtkZURcSpwBXAa8J+An0bEH2TmjibbkySNQVNn8pm5PjOfb7DqMuCBzHwrM/8F2Aic00xbkqSxa9WY/InASyPKQ/U6SdI4OuBwTUT8FHhvg1VLMnPlaG9rUNfwNp6IWAQsApg5c+aBuiNJGoMDhnxmfuwQ9jsEnDSi3AO8PMr+lwHLoLiF8hDakiSNolXDNTXgiog4PCJmA3OAVS1qS5I0imZvofzvETEEnAv8KCL+FiAz1wErgGeB/wtc6501kjT+mrqFMjMfBh4eZd0A4PP6ktRGPvEqSRVmyEtShRnyklRhhrwkVZghL0kVZshLUoUZ8pJUYYa8JFWYIS9JFWbIS1KFGfKSVGGGvCRVmCEvSRVmyEtShRnyklRhhrwkVZghL0kVZshLUoUZ8pJUYYa8JFWYIS9JFWbIS1KFGfKSVGFNhXxEfDkinouIpyPi4Yg4ZsS62yJiY0Q8HxEXNd9VSdJYNXsm/whwembOBf4fcBtARJwKXAGcBiwAvhERXU22JUkao6ZCPjMHM3N7vfiPQE99+TLggcx8KzP/BdgInNNMW5KksStzTP5/Aj+pL58IvDRi3VC9TpI0jqYcaIOI+Cnw3garlmTmyvo2S4DtwPd2va3B9jnK/hcBiwBmzpx5EF2WJB2sA4Z8Zn5sf+sj4mrgUuDCzNwV5EPASSM26wFeHmX/y4BlAL29vQ2/CCRJh6bZu2sWAJ8H+jNzeMSqGnBFRBweEbOBOcCqZtqSJI3dAc/kD+DrwOHAIxEB8I+Z+ceZuS4iVgDPUgzjXJuZO5psS5I0Rk2FfGb+5/2sGwAGmtm/JKk5PvEqSRVmyEtShRnyklRhhrwkVZghL0kVZshLUhvVanDddcVrKxjyktQmtRosXAhLlxavrQh6Q16S2mRwEIbrcwUMDxflshnyktQmfX3Q3V0sd3cX5bI1O62BJOkQ9ffD8uXFGXxfX1EumyEvSW3U39+acN/F4RpJqjBDXpIqzJCXpAoz5CWpwgx5SaowQ16SKsyQl6QKi8xsdx/eERGbgX9tdz/qpgNb2t2JNvMYeAzAYwAT/xi8LzNnNFoxoUJ+IomI1ZnZ2+5+tJPHwGMAHgPo7GPgcI0kVZghL0kVZsiPblm7OzABeAw8BuAxgA4+Bo7JS1KFeSYvSRVmyDcQEQsi4vmI2BgRi9vdn/EQEfdGxKaIeGZE3bER8UhEbKi/vrudfWy1iDgpIn4WEesjYl1E3FCvnxTHISKmRcSqiPhF/fP/Wb1+dkQ8Wf/8D0bE1Hb3tdUioisinoqIH9bLHXsMDPm9REQXsBT4b8CpwMKIOLW9vRoXfw0s2KtuMfBoZs4BHq2Xq2w78KeZeQowH7i2/nc/WY7DW8BHM/MMYB6wICLmA38B/FX98/87cE0b+zhebgDWjyh37DEw5Pd1DrAxM1/MzLeBB4DL2tynlsvMx4FX96q+DLi/vnw/8Mlx7dQ4y8xXMvOf68uvU/wjP5FJchyy8Ea9eFj9TwIfBb5fr6/s598lInqAS4B76uWgg4+BIb+vE4GXRpSH6nWT0fGZ+QoUAQi8p839GTcRMQs4E3iSSXQc6sMUa4FNwCPAC8C2zNxe32Qy/Hu4G7gV2FkvH0cHHwNDfl/RoM5bkCaRiDgS+Bvgxsz8bbv7M54yc0dmzgN6KP5Xe0qjzca3V+MnIi4FNmXmmpHVDTbtmGPgb7zuawg4aUS5B3i5TX1pt99ExAmZ+UpEnEBxdldpEXEYRcB/LzMfqldPuuOQmdsi4ucU1yaOiYgp9TPZqv97OB/oj4iLgWnA0RRn9h17DDyT39c/AXPqV9OnAlcAtTb3qV1qwNX15auBlW3sS8vVx16/A6zPzK+OWDUpjkNEzIiIY+rLRwAfo7gu8TPg8vpmlf38AJl5W2b2ZOYsin/7j2XmVXTwMfBhqAbq3+J3A13AvZk50OYutVxELAcuoJht7zfAF4H/A6wAZgL/Bnw6M/e+OFsZEfFfgb8Dfsnu8djbKcblK38cImIuxUXFLooTwBWZeWdEnExxA8KxwFPA/8jMt9rX0/ERERcAN2fmpZ18DAx5Saowh2skqcIMeUmqMENekirMkJekCjPkJanCDHlJqjBDXpIqzJCXpAr7/ybhF0po1rO7AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"pca = PCA(n_components=2)\n",
"principalComponents = pca.fit_transform(df)\n",
"principalDf = pd.DataFrame(data = principalComponents\n",
" , columns = ['principal component 1', 'principal component 2'])\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"principalDf.index=Data['_press_content']\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"kmeans.labels_ == 0\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# x축 : first y출 : second 번호로 나타낸후 plot으로 시각화\n",
"plt.scatter(principalDf.iloc[kmeans.labels_ == 0, 0], principalDf.iloc[kmeans.labels_ == 0, 1], s = 10, c = 'red', label = 'cluster1')\n",
"plt.scatter(principalDf.iloc[kmeans.labels_ == 1, 0], principalDf.iloc[kmeans.labels_ == 1, 1], s = 10, c = 'blue', label = 'cluster2')\n",
"#plt.scatter(principalDf.iloc[kmeans.labels_ == 2, 0], principalDf.iloc[kmeans.labels_ == 2, 1], s = 10, c = 'green', label = 'cluster3')\n",
"plt.legend()\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}