Skip to content

Instantly share code, notes, and snippets.

View woodongk's full-sized avatar
๐ŸŽฏ
Focusing

woodong woodongk

๐ŸŽฏ
Focusing
  • Samsung Electronics, Samsung Research
  • Seoul, Korea
View GitHub Profile
@woodongk
woodongk / I'm a night ๐Ÿฆ‰
Last active October 29, 2020 00:07
I'm a night ๐Ÿฆ‰
๐ŸŒž Morning 33 commits โ–ˆโ–Žโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 6.4%
๐ŸŒ† Daytime 165 commits โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 32.1%
๐ŸŒƒ Evening 180 commits โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Žโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 35.0%
๐ŸŒ™ Night 136 commits โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 26.5%
@woodongk
woodongk / count_ngram.py
Created May 26, 2020 04:48
๋ง๋ญ‰์น˜ ngram counter
from collections import Counter
from itertools import chain
def ngram_count(docs_tokenized, n, n_display=50):
'''
Args:
docs : ํ† ํฐ ๋ญ‰์น˜ 2d list
์˜ˆ์‹œ :[['๋ฌธ์žฌ์ธ', '์›์ „', '๊ตญ๋ฏผ', 'ํ˜ˆ์„ธ', '๋ฌผ์–ด๋‚ด', '๋ฌธ์žฌ์ธ', '๋Œ€ํ†ต๋ น', '๋ฌผ์–ด๋‚ด'],
['์ „์Ÿ', '์ œ์ผ', '๋จผ์ €', '์•„๊ฐ€๋ฆฌ', '๋Œ€ํ†ต๋ น', 'ํŠน์ˆ˜', '๋ถ€๋Œ€', '์‹ค๋ฏธ'],
n : n-gram ์„ ํƒ. e.g., unigram : 1, bigram : 2
@woodongk
woodongk / markdown.md
Last active May 14, 2020 08:10 — forked from ihoneymon/how-to-write-by-markdown.md
๋งˆํฌ๋‹ค์šด ์‚ฌ์šฉ๋ฒ•

[๊ณตํ†ต] ๋งˆํฌ๋‹ค์šด markdown ์ž‘์„ฑ๋ฒ•

1. ๋งˆํฌ๋‹ค์šด์— ๊ด€ํ•˜์—ฌ

1.1. ๋งˆํฌ๋‹ค์šด์ด๋ž€?

Markdown์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜์˜ ๋งˆํฌ์—…์–ธ์–ด๋กœ 2004๋…„ ์กด๊ทธ๋ฃจ๋ฒ„์— ์˜ํ•ด ๋งŒ๋“ค์–ด์กŒ์œผ๋ฉฐ ์‰ฝ๊ฒŒ ์“ฐ๊ณ  ์ฝ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ HTML๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ํŠน์ˆ˜๊ธฐํ˜ธ์™€ ๋ฌธ์ž๋ฅผ ์ด์šฉํ•œ ๋งค์šฐ ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ์˜ ๋ฌธ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์›น์—์„œ๋„ ๋ณด๋‹ค ๋น ๋ฅด๊ฒŒ ์ปจํ…์ธ ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ๋ณด๋‹ค ์ง๊ด€์ ์œผ๋กœ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งˆํฌ๋‹ค์šด์ด ์ตœ๊ทผ ๊ฐ๊ด‘๋ฐ›๊ธฐ ์‹œ์ž‘ํ•œ ์ด์œ ๋Š” ๊นƒํ—™(https://github.com) ๋•๋ถ„์ด๋‹ค. ๊นƒํ—™์˜ ์ €์žฅ์†ŒRepository์— ๊ด€ํ•œ ์ •๋ณด๋ฅผ ๊ธฐ๋กํ•˜๋Š” README.md๋Š” ๊นƒํ—™์„ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด ๋ˆ„๊ตฌ๋‚˜ ๊ฐ€์žฅ ๋จผ์ € ์ ‘ํ•˜๊ฒŒ ๋˜๋Š” ๋งˆํฌ๋‹ค์šด ๋ฌธ์„œ์˜€๋‹ค. ๋งˆํฌ๋‹ค์šด์„ ํ†ตํ•ด์„œ ์„ค์น˜๋ฐฉ๋ฒ•, ์†Œ์Šค์ฝ”๋“œ ์„ค๋ช…, ์ด์Šˆ ๋“ฑ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ธฐ๋กํ•˜๊ณ  ๊ฐ€๋…์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ•์ ์ด ๋ถ€๊ฐ๋˜๋ฉด์„œ ์ ์  ์—ฌ๋Ÿฌ ๊ณณ์œผ๋กœ ํผ์ ธ๊ฐ€๊ฒŒ ๋œ๋‹ค.

1.2. ๋งˆํฌ๋‹ค์šด์˜ ์žฅ-๋‹จ์ 

1.2.1. ์žฅ์ 

@woodongk
woodongk / word_cloud.py
Last active May 26, 2020 04:46
word cloud ๋งŒ๋“ค๊ธฐ
def generate_circular_wordcloud(strings):
"""Returns circle shape Word Cloud
Example:
strings (str): "๊ธฐ์–ต ๋‹ˆ์€ ๋””๊ทฟ ๊ธฐ์–ต ๊ธฐ์–ต"
strings (dict) {"๊ธฐ์–ต":30, "๋‹ˆ์€":10, "๋””๊ทฟ":1}
"""
# mask circle
x, y = np.ogrid[:1000, :1000]
@woodongk
woodongk / crawling_naver_news_comments.py
Last active April 6, 2020 23:28
๋„ค์ด๋ฒ„ ๋‰ด์Šค์—์„œ ๋Œ“๊ธ€ ๊ธ์–ด์˜ค๊ธฐ
# ์ถœ์ฒ˜ - https://wikidocs.net/61221
from selenium import webdriver
import time
def get_comments(URL,imp_time=5,delay_time=0.1):
#์›น ๋“œ๋ผ์ด๋ฒ„
driver = webdriver.Chrome('/usr/local/bin/chromedriver') #chromedriver
driver.implicitly_wait(imp_time)
driver.get(URL)
@woodongk
woodongk / get_outlier.py
Last active April 4, 2020 13:47
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ด์ƒ์น˜ ๊ฒ€์ถœํ•˜๊ธฐ - IQR ์‚ฌ์šฉ
#์ถœ์ฒ˜ - ํŒŒ์ด์ฌ์„ ์ด์šฉํ•œ ๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹ ์‹ค์ „ ๊ฐœ๋ฐœ ์ž…๋ฌธ
import np
def get_outlier(df=None,column=None,weight=1.5):
'''์ธ์ž๋กœ Dataframe๊ณผ ์ด์ƒ์น˜๋ฅผ ๊ฒ€์ถœํ•  ์นผ๋Ÿผ์„ ์ž…๋ ฅ๋ฐ›๋Š”๋‹ค.
iqr์— 1.5 ๊ณฑํ•ด์„œ ์ด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ด์ƒ์น˜๋ฅผ ๊ตฌํ•ด ํ•ด๋‹น ์ด์ƒ์น˜๊ฐ€ ์žˆ๋Š” index ๋ฐ˜ํ™˜
'''
column_x = df[column]
@woodongk
woodongk / text_preprocessing.py
Last active July 10, 2022 07:35
Korean-Text-Preprocessing in Python
import re
from konlpy.tag import Mecab
from khaiii import KhaiiiApi
def remove_brackets(string, left_paren_type,right_paren_type):
'''Remove brackets (parentheses) and their contents within a string
Args :
left_paren_type = '[','(' etc
right_paren_type = ']', ')' etc