地址:https://2021.naacl.org/program/accepted/
Paper List
会议接受论文通过爬虫提取到下面的excel中
共计:528篇
NAACL2021 Paper List.xlsx
NAACL2021 Paper List.xlsx
快速检索小工具
使用.py文件快速检索NAACL
import pandas as pd
import requests
from lxml import etree
def get_pdf(key):
url_format = "https://arxiv.org/search/?query={}&searchtype=all&abstracts=show&order=-announced_date_first&size=50"
rep = requests.get(url_format.format(key))
body = etree.HTML(rep.content)
ols = body.xpath(r'//*[@id="main-container"]/div[2]/p[1]/text()')
if ols:
ols = "Sorry, your query for all: {} produced no results.".format(r"Knowledge Guided Metric Learning for Few-Shot Text Classification get")
print(ols)
else:
ols = body.xpath(r'//*[@id="main-container"]/div[2]/ol/li')
for ol in ols:
print("[PDF]:",ol.xpath(r'./p[1]/text()')[0].replace("\n","").replace(" ",""),ol.xpath(r'./div/p/span/a[1]/@href')[0])
# 查询关键词列表函数
def Search_domain_print(key_list,df,withPdf=False):
keys = set([key.lower() for key in key_list])
for key in keys:
count = 0
for i in df["title"].values.tolist():
if key in i.lower():
count = count + 1
print("[{}]-[{}]:{}".format(key,count,i))
if withPdf:
get_pdf(i)
print()
if __name__ == '__main__':
excel = pd.read_excel("data/NAACL2021 Paper List.xlsx")
key_list = ["Text Classification",
# "Sentiment Analysis","Knowledge Graph",
]
# withPdf设置为True可以直接检索并获取pdf,但速度会很慢。
# 也可以使用单步函数get_pdf("标题")直接查询要的文章
excel = pd.read_excel('data/NAACL2021 Paper List.xlsx')
Search_domain_print(key_list,excel,withPdf=False)
Attention:
Knowledge Guided Metric Learning for Few-Shot Text Classification
少样本学习、文本分类、知识导向
DART: Open-Domain Structured Data Record to Text Generation
开放域数据、结构化数据、文本生成
Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning
少样本学习、三重网络、数据增强、文本分类
News Headline Grouping As A Challenging NLU Task
新闻标题、NLU