Skip to content

Instantly share code, notes, and snippets.

@dee-walia20
Last active February 18, 2020 16:10
Show Gist options
  • Save dee-walia20/67d1a6069151a180279bbf6f80630c9c to your computer and use it in GitHub Desktop.
Save dee-walia20/67d1a6069151a180279bbf6f80630c9c to your computer and use it in GitHub Desktop.
Data Cleaning_2
freq_words=df.Treated_Tweet.str.split(expand=True).stack().value_counts()[:10]
freq_words=list(freq_words.index)
rare_words=df.Treated_Tweet.str.split(expand=True).stack().value_counts()
rare_words=list(rare_words.loc[lambda x: x==1].index)
#Remove Frequent and Rare words
def remove_noise_words(text):
edited_text=text.split()
edited_text=[word for word in edited_text if word not in freq_words]
edited_text=[word for word in edited_text if word not in rare_words]
edited_text=" ".join(edited_text)
return edited_text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment