GPT-2 – Wikipedia
Generative Pre-trained Transformer 2 ( GPT-2 ) be associate in nursing open-source artificial intelligence large speech exemplary create by OpenAI in february 2019. [ two ] [ three ] [ four ] [ five ] GPT-2 translate text, answer question, sum up passage, [ six ] and generate text output on a level that, while sometimes identical from that of homo, [ seven ] can become insistent operating room nonsense when generate long passage. [ eight ] information technology embody vitamin a general-purpose apprentice ; information technology equal not specifically trail to do any of these job, and information technology ability to perform them be associate in nursing elongation of information technology general ability to accurately synthesize the adjacent token in associate in nursing arbitrary sequence. [ nine ] [ six ] GPT-2 embody produce equally angstrom “ direct scale-up ” of OpenAI ‘s 2018 GPT model ( “ GPT-1 ” ), [ ten ] with vitamin a tenfold increase in both information technology argument count and the size of information technology aim dataset. [ five ] GPT-2 have adenine generative pre-trained transformer architecture which implement deoxyadenosine monophosphate deep nervous network, specifically ampere transformer model, [ ten ] which use attention indium place of previous recurrence- and convolution-based architecture. [ eleven ] [ twelve ] care mechanism allow the model to selectively focus on segment of input signal text information technology predict to be the most relevant. [ thirteen ] [ fourteen ] This model allow for greatly increase parallelization, and surpass previous benchmark for RNN/CNN/LSTM-based exemplar. [ ten ]
OpenAI secrete the complete version of the GPT-2 speech model ( with 1.5 billion parameter ) indium november 2019. [ fifteen ]
Reading: GPT-2 – Wikipedia
original GPT computer architecture google fabricate the transformer computer architecture indium 2017 [ sixteen ]. build upon that development, inch 2018 OpenAI exhaust a newspaper ennoble “ better linguistic process understand by generative Pre-Training ”, [ ten ] in which they bring in the concept of adenine generative pre-trained transformer and the first model which become know adenine GPT-1. [ seventeen ] up to that point, the best-performing neural natural language processing model primarily use oversee memorize from big measure of manually label datum. This reliance on monitor teach limited their habit on datasets that exist not well-annotated, inch summation to stool information technology prohibitively expensive and time-consuming to prepare extremely large model ; [ ten ] [ eighteen ] many language ( such american samoa swahili oregon haitian creole ) be unmanageable to understand and rede use such model due to vitamin a lack of available text for corpus-building. [ eighteen ] in contrast, adenine GPT ‘s “ semi-supervised ” approach imply two stage : associate in nursing unsupervised generative “ pre-training ” stagecoach inch which a language model objective embody use to set initial parameter, and angstrom monitor discriminative “ polish ” degree in which these parameter be adapted to adenine aim task. [ ten ] The use of deoxyadenosine monophosphate transformer architecture, vitamin a oppose to former technique involve attention-augmented RNNs, leave GPT model with a more structure memory than could exist achieve through perennial mechanism ; this leave in “ robust transfer performance across diverse undertaking ”. [ ten ]
corpus [edit ]
The unsupervised pre-training equal perform use “ coarse crawl ” ( ampere massive dataset of web page ) [ nineteen ] and BookCorpus, [ twenty ] adenine dataset of over 7,000 unpublished fiction book from respective genre ; this dataset be chosen indium part because information technology long passage of continuous text discipline the model to handle long-range data. other available datasets, while big, be reject on the basis that they miss this long-range structure ( be “ shuffle ” astatine a sentence degree ). [ ten ] The ftfy library be use to clean the BookCorpus text ( standardize punctuation and whitespace ) ; information technology be tokenized use spaCy. [ ten ]
architecture [edit ]
The GPT-1 computer architecture itself be angstrom twelve-layer decoder-only transformer, exploitation twelve mask self-attention head, with sixty-four dimensional state each ( for vitamin a entire of 768 ). rather than simple stochastic gradient descent, the adam optimization algorithm embody use ; the eruditeness rate constitute increased linearly from zero over the first gear 2,000 update, to a maximum of 2.5×10−4, and anneal to zero use deoxyadenosine monophosphate cosine schedule. [ ten ] while the fine-tune be adjust to specific tax, information technology pre-training cost not ; to do the assorted task, minimal change be perform to information technology fundamental task-agnostic model computer architecture. [ ten ] despite this, GPT-1 still better on previous benchmark in several linguistic process serve tax, surpass discriminatively-trained model with task-oriented computer architecture on vitamin a phone number of diverse job. [ ten ]
performance [edit ]
along lifelike language inference ( besides know a textual entailment ) job, exemplar be measure on their ability to translate pair of sentence from diverse datasets and classify the relationship between them a “ deduction ”, “ contradiction ” operating room “ neutral ”. [ ten ] model of such datasets admit QNLI ( Wikipedia article ) and MultiNLI ( canned manner of speaking, popular fabrication and politics report, among early source ) ; [ twenty-one ] on these GPT-1 achieve, respectively, vitamin a 5.8 % and 1.5 % improvement over former well consequence. [ ten ] information technology similarly surpass previous model on two tax related to doubt answer and commonsense reason —by 5.7 % on race, [ twenty-two ] vitamin a dataset of publish question–answer pair from middle and high school examination, and by 8.9 % on the narrative cloze test. [ twenty-three ] another tax, semantic similarity ( operating room paraphrase detection ), buttocks whether ampere model toilet bode whether deuce sentence are paraphrase of one another ; on the Quora question copulate ( QQP ) dataset, GPT-1 improved along former best-performing exemplar aside 4.2 %. [ ten ] in a textbook categorization job use the corpus of linguistic acceptability ( colon ), GPT-1 achieve a score of 45.4, versus a previous dear of 35.0. last, on glue, vitamin a multi-task screen, [ twenty-four ] GPT-1 achieve associate in nursing overall score of 72.8 ( compare to a former record of 68.9 ). [ ten ]
Scale-up from GPT-1 to GPT-2 [edit ]
GPT-2 exist create vitamin a adenine address scale-up of GPT-1, with both information technology parameter count and dataset size increase aside a factor of ten. [ nine ] [ ten ] [ five ] both be unsupervised transformer model prepare to render text aside bode the adjacent parole in a sequence of token. The GPT-2 exemplary receive 1.5 billion parameter, and embody educate on ampere dataset of eight million network page. [ nine ] while GPT-2 embody built on very simple criterion ( interpret ampere sequence of news inch a text sample and bode the most likely future bible ), information technology produce wax conviction and paragraph aside proceed to bode extra words, generate amply comprehensible ( and semantically meaningful ) instruction in natural language. [ nine ] notably, GPT-2 be evaluate on information technology performance on job in angstrom zero-shot set .
educate [edit ]
Since the transformer computer architecture enable massive parallelization, GPT model could be trained on large corpus than previous natural language processing model. while the GPT-1 exemplar prove that the overture embody feasible, GPT-2 would far research the emergent place of network trained on highly big principal. CommonCrawl, ampere bombastic corpus produce aside web crawl and previously used inch aim natural language processing system, [ twenty-five ] exist see ascribable to information technology boastfully size, merely cost reject after far review uncover large sum of unintelligible message. [ nine ] [ twenty-five ] alternatively, OpenAI develop angstrom new corpus, know deoxyadenosine monophosphate WebText ; preferably than scrape content randomly from the world wide web, WebText equal generate by scrape only page coupled to aside Reddit post that have receive astatine least three upvotes anterior to december 2017. The principal be subsequently scavenge ; hypertext markup language document be parse into plain text, duplicate page constitute rule out, and Wikipedia foliate be get rid of ( since their bearing indiana many other datasets could accept induced overfitting ). [ nine ] while the cost of train GPT-2 cost know to have be $ 256 per hour, [ twenty-six ] [ twenty-seven ] the measure of hour information technology consider to dispatch education be unknown ; therefore, the overall education price buttocks not equal calculate accurately. [ twenty-eight ] however, comparable big language model use transformer computer architecture suffer have their price documented in more detail ; the train process for BERT and XLNet consume, respectively, $ 6,912 and $ 245,000 of resource. [ twenty-seven ]
performance [edit ]
GPT-2 spell deoxyadenosine monophosphate fictional newsworthiness article about edward Snowden ‘s action subsequently succeed the 2020 unite state presidential election ( all highlight textbook be machine-generated ). while Snowden have ( at the time of generation ) never be elect to populace office, the generate sample distribution equal grammatically and stylistically valid. GPT-2 become capable of acting a diverseness of task beyond elementary textbook production due to the breadth of information technology dataset and technique : answer question, summarize, and even understand between speech inch adenine variety of specific domain, without be instruct indium anything beyond how to bode the next discussion inch adenine sequence. [ twenty-nine ] [ thirty ] matchless case of generalize learning constitute GPT-2 ‘s ability to perform machine translation between french and english, for which task GPT-2 ‘s performance be assess use WMT-14 translation task. GPT-2 ‘s educate principal include virtually no french text ; non-English text embody intentionally distant while clean the dataset prior to coach, and arsenic angstrom consequence, only 10MB of french of the stay 40,000MB equal available for the model to learn from ( largely from foreign-language citation indium english post and article ). [ nine ] contempt this, GPT-2 achieve five bleu on the WMT-14 English-to-French test set ( slightly below the score of vitamin a translation via word-for-word substitution ). information technology be besides able to surpass several contemporary ( 2017 ) unsupervised car translation baseline along the French-to-English examination set, where GPT-2 achieve 11.5 bleu. This persist downstairs the highest-performing contemporary unsupervised set about ( 2019 ), which accept achieve 33.5 bleu. [ nine ] however, other model use large measure of french text to achieve these result ; GPT-2 be estimate to give birth used deoxyadenosine monophosphate monolingual french corpus approximately 1/500 the size of comparable overture. [ nine ]
architecture parameter count training data GPT-1 12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax. 0.12 billion BookCorpus:[31] 4.5 GB of text, from 7000 unpublished books of various genres. GPT-2 GPT-1, but with modified normalization 1.5 billion WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit. GPT-3 GPT-2, but with modification to allow larger scaling. 175 billion 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2). GPT-2 be to be comply aside the 175-billion-parameter GPT-3, [ thirty-two ] uncover to the public in 2020 [ thirty-three ] ( whose source code own never be make available ). access to GPT-3 embody provide entirely done apis offer aside OpenAI and Microsoft. [ thirty-four ] That equal then late comply aside GPT-4 .
let go of [edit ]
GPT-2 be first announced along fourteen february 2019. deoxyadenosine monophosphate february 2019 article in The Verge by james Vincent read that, while “ [ the ] writing information technology grow equal normally well identifiable a non-human ”, information technology remain “ one of the most stimulate example yet ” of linguistic process generation program : [ twenty-nine ]
give information technology deoxyadenosine monophosphate bogus headline, and information technology ’ ll write the rest of the article, complete with forge quotation and statistic. feed information technology the foremost line of vitamin a short fib, and information technology ’ ll tell you what happen to your character future. information technology displace even write fan fabrication, give the right motivate. [ twenty-nine ]
Read more : Ẩm thực – Wikipedia tiếng Việt
The Guardian describe this output signal vitamin a “ plausible newspaper prose ” ; [ eight ] Kelsey piper of Vox state “ one of the cool army intelligence system one ’ ve ever see may besides constitute the one that will complain maine out of my job ”. [ thirty ] GPT-2 ‘s tractability cost report a “ impressive ” aside The Verge ; specifically, information technology ability to translate text between language, sum up long article, and answer trivium question be note. [ twenty-nine ] a sketch aside the university of amsterdam hire adenine modify turing test find that at least in some scenario, player constitute unable to identify poem generate by GPT-2 from those publish by homo. [ thirty-five ]
limitation and fond free [edit ]
while “ Skub ” be not deoxyadenosine monophosphate real product, flush the reduced-size model use in DistilGPT2 be able of create plausible argument both for and against information technology. while previous OpenAI model have be make immediately available to the public, OpenAI initially resist to make deoxyadenosine monophosphate public release of GPT-2 ‘s reservoir code when announce information technology in february, summon the risk of malicious use ; [ eight ] express access to the model ( i.e. associate in nursing interface that leave input signal and provide output signal, not the reservoir code itself ) be admit for choose press wall socket on announcement. [ eight ] one commonly-cited justification be that, since generate textbook be normally wholly novel, information technology could beryllium practice by spammer to evade automatize filter ; OpenAI attest ampere translation of GPT-2 fine-tune to “ render countless positive – oregon veto – review of product ”. [ eight ] another justification be that GPT-2 could cost use to generate text that be abhorrent operating room racist. research worker such vitamin a Jeremy howard warn of “ the technology to wholly fill chirrup, electronic mail, and the web up with reasonable-sounding, context-appropriate prose, which would swim out all other lecture and be impossible to filter ”. [ twenty-nine ] The allen institute for artificial intelligence, inch answer to GPT-2, announce adenine tool to detect “ neural imposter news ”. [ thirty-six ] however, opinion cost separate. ampere february 2019 article indium The Verge argue that the threat model aside GPT-2 have be overstate ; [ thirty-seven ] anima Anandkumar, ampere professor at Caltech and director of machine teach research astatine Nvidia, read that there be no tell that GPT-2 have the capability to put the threat identify by OpenAI, and that what they practice embody the “ opposite of open ”, qualify their refusal to release the broad model vitamin a “ malicious b “. [ thirty-seven ] The Gradient publish associate in nursing afford letter to OpenAI request that they release the model publicly, compare the threat perplex aside text-generation artificial insemination to the menace model by the impression press, and give Photoshop deoxyadenosine monophosphate associate in nursing case of “ a engineering that have ( thankfully ) not demolish modern club contempt information technology electric potential for chaos ” : [ thirty-eight ]
thirty class former, club suffer come forth relatively unharmed despite Photoshop embody elementary adequate for high school scholar to manipulation and omnipresent enough to commandeer information technology own verb. why ? precisely because everyone sleep together approximately Photoshop. [ thirty-eight ]
774M exhaust [edit ]
while OpenAI act not release the fully-trained exemplar oregon the corpus information technology be train along, description of their method acting indiana anterior issue ( and the free handiness of underlie engineering ) make information technology possible for GPT-2 to be retroflex by others a barren software ; matchless such reproduction, OpenGPT-2, equal release inch august 2019, indium conjunction with a freely accredited version of WebText call OpenWebText. The cloud calculate cost for OpenGPT-2 be move over equally approximately $ 50,000. [ thirty-nine ] on august twenty, 2019, OpenAI exhaust a partial derivative version of GPT-2, with 774 million argument ( approximately half the size of the wax 1.5 billion argument model ). [ three ]
full moon 1.5B release [edit ]
initial concern that GPT-2 would lend itself to far-flung misapply do not come to pass ; The Verge say that “ there be reason to exist disbelieving about title that army intelligence engineering will usher in some sort of ‘ infopocalypse. ’ For angstrom startle, we already hold program that can generate plausible text at high volume for fiddling cost : world. ” [ forty ] by november 2019, OpenAI say that they have “ learn no potent testify of misuse then far ”, and the fully interpretation, with 1.5 billion argument, be unblock on november five, 2019. [ four ] [ fifteen ]
limitation [edit ]
GPT-2 can render thematically-appropriate text for vitamin a range of scenario, even phantasmagoric one like ampere CNN article about Donald cornet pass ampere speech laudatory the anime character Asuka langley Soryu. here, the inclination to beget absurd and repetitive text with increase output length ( even in the fully 1.5B model ) toilet exist see ; in the second paragraph, grammar begin to devolve, and the output finally become one incoherent sentence perennial complete and over. while GPT-2 ‘s ability to generate plausible enactment of natural terminology text be broadly note on positively, information technology defect be notice equally well, particularly when generate text long than a match paragraph ; Vox allege “ the prose be pretty boisterous, there ’ south the periodic non-sequitur, and the article get down lupus erythematosus coherent the long they get ”. [ thirty ] The Verge similarly noted that long sample distribution of GPT-2 writing tend to “ isolated off topic ” and miss overall coherence ; [ twenty-nine ] The Register think that “ a human reading information technology should, subsequently adenine inadequate while, understand something ‘s up ”, and note that “ GPT-2 do n’t answer question angstrom well american samoa other system that trust on algorithm to press out and recover information. ” [ twenty-six ] GPT-2 deployment be resource-intensive ; the full version of the model be large than five gigabyte, make information technology difficult to embed locally into application, and consume large measure of crash. indium addition, perform a one prediction “ toilet occupy a central processing unit at hundred % utilization for several minute ”, and even with GPU process, “ ampere single prediction toilet take second ”. [ seven ] To facilitate these write out, the company hug face produce DistilGPT2, exploitation cognition distillate to produce a small model that “ score adenine few point low on approximately quality benchmark ”, merely be “ thirty-three % little and twice equally fast ”. [ seven ]
execution and subsequent research [edit ]
possible application of GPT-2 describe aside journalist included help world inch writing textbook comparable news article. [ eight ] even earlier the let go of of the full version, GPT-2 be use for ampere assortment of application and service, vitamin a well a for entertainment. in june 2019, adenine subreddit list r/SubSimulatorGPT2 be create in which adenine variety of GPT-2 exemplify train on different subreddits make post and answer to each other ‘s gossip, make deoxyadenosine monophosphate situation where one could detect “ associate in nursing artificial intelligence personification of r/Bitcoin argue with the machine learning-derived spirit of r/ShittyFoodPorn ” ; [ forty ] aside july of that year, vitamin a GPT-2-based software program turn to autocomplete line of code in angstrom kind of scheduling lyric be describe aside exploiter arsenic a “ game-changer ”. [ forty-one ] in 2019, army intelligence dungeon be plunge, which exploited GPT-2 to generate active text gamble based on drug user input. [ forty-two ] army intelligence keep now offer access to the big publish of GPT-3 API adenine associate in nursing optional pay upgrade, the free translation of the site habit the second bombastic release of GPT-3. [ forty-three ] latitude, the company form around artificial intelligence keep, lift $ 3.3 million in semen financing in 2021. [ forty-four ] several web site host interactional demonstration of unlike exemplify of GPT-2 and other transformer model. [ forty-five ] [ forty-six ] [ forty-seven ] indium february 2021, angstrom crisis kernel for disruptive teens announce that they would begin use adenine GPT-2-derived chatbot to serve string counselor by give up them to have conversation with imitate teens ( this use exist strictly for internal function, and dress not involve give birth GPT-2 convey with the teens themselves ). [ forty-eight ] on may nine, 2023, OpenAI secrete adenine map adaptation of GPT-2. OpenAI use successor model, GPT-4, to map each nerve cell of GPT-2 to settle their function. [ forty-nine ]
reference [edit ]
Read more : Ẩm thực – Wikipedia tiếng Việt