任务：We’ll design and prototype ML procedural methods for generating crypto art in the form of NFT tokens, with focus on actionable (conditional) strategies.

设计并原型化ML过程方法，即以NFT令牌的形式生成加密艺术
聚焦于可操作的（有条件的）策略？

Stage 1

了解NFT for arts的相关概念
寻找合适的推特情感和话题分析研究方法
发现NFT Art中存在的trust和discrimination

NFTs for Art and Collectables: Primer and Outlook

Non-fungible tokens (NFTs)

定义：digital representations of unique objects, physical or virtual, typically associated with unique digital media, such as images or music
特征：their creation and management on blockchains
组成：区块链上的一组字符串，包含指向该艺术作品地址的pointer（相当于通行证）
代表：BEEPLE-COLLECT - beeple NFTs, collectibles, weird crap.——拍卖史上在世艺术家作品排第六高价
NFT 所有者拥有的炫酷图像或视频实际上并不存在于区块链上。
相反，token指的是位于网络上其他地方的文件。
如果该文件存储在传统的网址中，那么如果该文件曾经离线，NFT 所有者就会处于危险之中。
NFT 行业对此问题有解决方案，但利益相关者尚未确定最佳实践，这使得买家难以评估哪些 NFT 可以长期安全地存储数据。

艺术品NFT的垄断性阐述？

拥有NFT并不代表独家访问权，不影响他人下载和欣赏的能力
大多数NFT不授予版权或其他传统知识版权所有权

What are NFTs good for?

represent ownership of any asset that takes the form of unique units
represent real-world objects such as real estate or luxury goods, tickets for events etc.
represent serve as identity documents or credentials that are useable on blockchains

Blockchains

What: an ever-growing list—specifically, an ordered sequenceof messages maintained by a network of servers. （区块链是一个共享的、不可篡改的账本，旨在促进业务网络中的交易记录和资产跟踪流程，本质上是一个去中心化数据库）
Properties:
- Append-only posting: Messages, once posted, cannot be removed.
- Immutability: Messages, once posted, cannot be modified, nor can their ordering.
- World readability: Anyone can view the full contents of the blockchain, i.e., all messages.

区块链和比特币等数字货币的关系

区块链并不等同于比特币。
区块链是比特币的底层技术和基础架构，而比特币是区块链的一种应用。

Wallets

allow users to obtain and transact in cryptocurrency(加密货币)
resides on a blockchain

Cryptographic keys
A private key—very loosely analogous to a password—enables the owner to authorize creation, sales, or transfers of her NFTs and receive the proceeds of NFT sales. Conversely, it allows a user to spend cryptocurrency to purchase NFTs and take ownership of them.

Minting:
The creation process of an NFT. Simply means uniquely publishing your token on the blockchain to make it purchasable.

Marketplaces:

NFTs for sale can be listed on marketplaces, websites that act as front ends for the blockchains on which the NFTs reside.
Marketplaces can be viewed in an ordinary browser (even without use of a wallet).
provide technical tools for the creation of NFT smart contracts.
collect commissions on the NFT sales they intermediate

Smart contracts

Smart contracts are digital contracts stored on a blockchain that are automatically executed when predetermined terms and conditions are met.
They typically are used to automate the execution of an agreement so that all participants can be immediately certain of the outcome, without any intermediary’s involvement or time loss. They can also automate a workflow, triggering the next action when conditions are met.
How: work by following simple “if/when…then…” statements
Benefits: transparency, integrity, trusted third party
解决问题：the problem of fair exchange, 高效替代trusted third parties

Decentralized identity(DID):
Decentralized identity refers broadly to the idea of users controlling their own blockchain-based credentials, that is, attestations to personal characteristics.

艺术市场的结构和运作

Artisits
Gallerists/Dealers
Auction houses
Buyers (Collectors and Art Investors)
Museums
Critics

NFT ethics

ML

Description
Machine Learning (ML) algorithms are software that plays a role typically assumed by human beings. Machine learning uses computer algorithms that can improve automatically through experience and by the use of data.
Application to NFT Artwork
Machine learning facilitates computer generated NFT artwork and art markets with non-human participants.
Current Example
Algorithms can generate an NFT artwork from a generative model. Algorithms could then price and bid on the piece and a machine learning driven critic could assess its merits.
Future Potential
ML can partially or fully supplant the roles of all art-market participants: artists, buyers, gallerists, and sellers. We can imagine ML-based approaches to generative art, buying, curation, criticism, etc.

问题：

用ML生成艺术品 The hacker could then use machine learning techniques to generate NFT artworks that are difficult for users to distinguish visually from legitimate works by the artist.
操纵盈利 Financial trading firms commonly use machine learning to craft trading strategies.
1. Prediction of NFT market movements to devise profitable strategies
2. Discover ways to manipulating art markets, i.e., strategies to influence the trading behavior of market participants
3. Artists unable to execute more control over the sale of their work due to lack of technical knowledge

Sentiment Analysis

Hota, H.S., Dinesh K. Sharma, and Nilesh Verma. ‘Lexicon-Based Sentiment Analysis Using Twitter Data’. In Data Science for COVID-19, 275–95. Elsevier, 2021. https://doi.org/10.1016/B978-0-12-824536-1.00015-0.

What
- to identify the human sentiment in terms of Negative, Neutral, or Positive
Lexicon-based
- Assumption: sentiment is related to the presence of certain words or phrases in the document
- lexicon: a set of features that have an assigned sentiment value
- Dictionary-based sentiment(labMT)
- Rule-based sentiment analysis tool(Vader)
- Senti-N-Gram lexicon
- Semantic similarity metric between text word and lexicon vocabulary
Machine learning (ML)-based
- Hassonah et al., SVM and evolutionary approach
- Pandey et al., metaheuristic-based method based on K-means and Cuckoo search to find optimum cluster-heads from the sentiment contents of the Twitter dataset
- Ankit et al., ensemble classification(Random Forest, Linear Regression, and SVM)
Lexicon-based v.s. ML-based:
- SA. Mukhtar et al., Lexicon-based approach outperforms ML approach.
- Lexicon-based approach
  - coverage as important lexical features are ignored, acquiring a new set of lexical features along with their valence scores is a labor-intensive and time-consuming process.
- ML-based approach
  - requires a massive amount of training-testing data, several features required in a training sample are sometimes challenging to obtain, and it is computationally expensive

Methodology

Data collection

TWINT API: Twitter scraping tool

Data preprocessing

Removing noise
Change in lower case
Tokenization
Removing stop words
Lemmatization

Topic modeling

Tong, Xin, Yixuan Li, Jiayi Li, Rongqi Bei, and Luyao Zhang. ‘What Are People Talking about in #BackLivesMatter and #StopAsianHate? Exploring and Categorizing Twitter Topics Emerging in Online Social Movements through the Latent Dirichlet Allocation Model’, 29 May 2022. https://doi.org/10.1145/3514094.3534202.

主题模型（topic model）是以非监督学习的方式对文集的隐含语义结构（latent semantic structure）进行聚类（clustering）的统计模型。主题建模是自然语言处理(NLP)中用于训练机器学习模型的一种方法。它是指从文档中逻辑地选择属于某个主题的单词的过程。
Aim: to extract and summarize trending issues from documents
Vairables:
- 隐变量：文档-主题分布，主题-单词分布
- 观测变量：文档-单词分布
条件概率的链式法则：
- 原始文档的单词分布 $HCI Summer research - NFTs Art discrimination, trust and awareness - 图4$ #card=math&code=P%28W%7CD%29&id=pu1HZ)就是我们的训练数据，用来学习两个分布 $HCI Summer research - NFTs Art discrimination, trust and awareness - 图5$ #card=math&code=P%28W%7CT%29&id=zssbR)和 $HCI Summer research - NFTs Art discrimination, trust and awareness - 图6$ #card=math&code=P%28T%7CD%29&id=oEcPb)
Latent Dirichlet Allocation(LDA)
- a mixture probabilistic model
- 它通过最大化词语共现概率寻找词语聚类，利用狄利克雷分布刻画文档生成过程，并限定文档主题数量，避免PLSA方法过拟合以及参数过多的问题。
Dirichlet Multinomial Mixture (GSDMM)
- use case：Better adapted to short texts, like the collapsed Gibbs Sampling algorithm
Hawkes model
- Hawkes models are most commonly applied to derived time series representing the occurrence of significant events (jumps/extreme changes) in the original time series, rather than to the original time series. 霍克斯模型最常用于代表原始时间序列中重大事件（跳跃/极端变化）的派生时间序列，而不是原始时间序列。

LDA模型的学习与推理无法直接求解，通常使用吉布斯抽样（Gibbs sampling）和变分EM算法（variational EM algorithm)，前者是蒙特卡罗法，而后者是近似算法。

Visual Analysis of Discrimination

Visual Analysis of Discrimination in Machine Learning - Qianwen Wang, Zhenhua Xu, Zhutian Chen, Yong Wang, Shixia Liu, and Huamin Qu

Discrimination on protected attributes: gender, race, age
- Method: to hide protected attributes → decrease accuracy，ineffective
Consider treatment
- which individuals should be regarded as similar for the task at hand.
- how to present discrimination among these similar individuals effectively
Existing methods: two aspects
- the data:
  - examine different treatments among similar items
- the model:
  - discriminatory predictions due to the data distribution and the model learning mechanism → quantify the model → effective visual presentation
DiscriLens: an interactive visualization tool enables model users to flexibly interpret and analyze the discrimination in ML.
- How: identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining.
- Combination
  - (a) scatter plots offer an overview of the discriminatory itemsets;
  - (b) RippleSets reveal the intersections among these itemsets;
  - (c) the attribute matrix represents the details of each discriminatory itemset;
  - (d) the comparison mode enables users to compare two models side by side.
- Design Goals
  - 1 Customize the definition of discrimination.
  - 2 Measure the degree of discrimination.
  - 3 Identify the condition of discrimination.
  - 4 Depict the distribution of discrimination.
  - 5 Compare discrimination.
Related work
- FairSight
  - represents a workflow that supports the four fairness-aware actions (i.e., understand, measure, identify, and mitigate) required in decision making.
  - attributes
    - a) identifies discriminated instances based on an KNN algorithm
    - b) provides a global-level measure by aggregating over all instances.
  - Cons: fails to uncover the discriminatory itemsets and to reveal “when and where will a model yield discriminatory predictions”
- FairVis
  - more related to DiscriLens due to its focus on the analysis of discriminatory itemsets.
  - Cons:
    - focuses on suggesting similar discriminatory itemsets
    - No special visualizations designed for revealing the relationships among subgroups
Limitation
- not friendly for novice, learning curve

Others

HCI Summer research - NFTs Art discrimination, trust and awareness