nofootnotes date December 2009 In linguistics , a corpus plural corpora or textcorpus is a large and structured ... linguistic rules on a specific universe. A corpus may contain texts in a single language monolingual corpus or text data in multiple languages multilingual corpus . Multilingual corpora that have ... by their find site dates. Some notable text corpora English language American National Corpus Bank of English British National CorpusCorpus Juris Secundum Corpus of Contemporary American English COCA 400 million words, 1990 present. Freely searchable online. Brown Corpus , forming part of the Brown Family of corpora, together with LOB Corpus LOB , Frown and F LOB. International Corpus of English Oxford English Corpus Scottish Corpus of Texts & Speech Other languages Hamshahri Corpus Persian ... known as annotation . An example of annotating a corpus is part of speech tagging , or POS tagging , in which information about each word s part of speech verb, noun, adjective, etc. is added to the corpus ... of each word. When the language of the corpus is not a working language of the researchers who use it, interlinear ... that the entire corpus is completely and consistently annotated means that these corpora are usually ... . Corpora are the main knowledge base in corpus linguistics . The analysis and processing of various ... teaching . Archaeological corpora Text corpora are also used in the study of historical document ... of the shortest corpora in time, may be the 15 30 year Amarna letters texts 1350 BC . The corpus of an ancient ... English Persian Parallel Corpus http ece.ut.ac.ir nlp TMC Tehran Monolingual Corpus, Standard corpus for Persian Language Modeling http ece.ut.ac.ir nlp Bijankhan Corpus A Contemporary Persian Corpus for NLP researches CETENFolha Croatian National Corpus Czech National Corpus Neo Assyrian TextCorpus Project Russian National Corpus Slovenian National Corpus Thesaurus Linguae Graecae Ancient Greek Quranic Arabic Corpus Classical Arabic Eastern Armenian National Corpus EANC 110 million words ... more details
In the Neo Assyrian TextCorpus Project , the following works are published State archives of Assyria Cuneiform script cuneiform texts The following works are published in the series State Archives of Assyria Cuneiform Texts 1997&ndash SAACT Volume I.. The Standard Babylon ian Epic of Gilgamesh , by Simo Parpola , 1997. 2001&ndash SAACT Volume II.. The Standard Babylonian Etana Epic poetry Epic , by Jamie R. Novotny, 2001. State archives of Assyria studies The following works are published in the series State Archives of Assyria Studies 1992&ndash SAAS Volume I.. Neuassyrische Iconography Glyptik des 8. 7.Jh. v. Chr. unter besonderer Ber cksichtigung der Siegelungen auf Tafeln und Tonverschl sse, by Suzanne Herbordt, 1992. 1994&ndash SAAS Volume II.. The Eponym s of the Assyrian Empire 910 BC 910 &ndash 612 BC , by Alan Millard , 1994. 1995&ndash SAAS Volume III.. The Use of Numbers and Quantifications in the Assyrian Royal Inscription s, by Marco De Odorico, 1995. 1996&ndash SAAS Volume IV.. Nippur in Late Assyrian Times c. 755 BC 755 &ndash 612 BC , by Steven W. Cole, 1996. 1996&ndash SAAS Volume V.. Assyria Neo Assyrian Judicial Procedures, by Remko Jas, 1996. 1997&ndash SAAS Volume VI.. Die neuassyrischen Privatrechtsurkunden als Quelle f r Mensch und Umwelt, by Karen Radner, 1997. 1998&ndash SAAS Volume VII.. References to Prophecy in Neo Assyrian Sources, by Martti Nissinen, 1998. 1998&ndash SAAS Volume VIII.. Die Annalen des Jahres 711 v. Chr. nach Prismenfragmenten aus Nineveh Nineve und Assur , by Andreas Fuchs, 1998. 1999&ndash SAAS Volume IX.. The Role of Naqia Zakutu in Sargon Sargonid Politics, by Sarah C. Melville, 1999. 1999&ndash SAAS Volume X.. Herrschaftswissen in Mesopotamia Mesopotamien Formen der Kommunikation zwischen Gott und K nig im 2. und 1. Jahrtausend ... List of eponyms Textcorpus References Cole, S. Nippur in Late Assyria n Times, c. 755 BC 755 612 BC , by Steven W. Cole, The Neo Assyrian TextCorpus Project, University of Helsinki, by Vammalan Kirjapaino ... more details
NOTOC wiktionarypar corpusCorpus Latin plural corpora , English plural corpuses or corpora is Latin for body . It may refer to TOCright Corpus Christi disambiguation Corpus , the figure of Christ on a crucifix . Corpus linguistics Textcorpus , in linguistics, a large and structured set of texts Speech corpus , in linguistics, a large set of speech audio files Law Habeas corpus , a legal mechanism to end detention of a suspect Corpus delicti , a legal term meaning body of the crime Biology Corpus callosum , a structure in the brain Corpus cavernosum disambiguation , a pair of structures in human genitals Corpus luteum , a temporary endocrine structure in mammals Body of stomach Corpus gastricum , the Latin term referring to the body of the stomach. Writings including medical and legal Hippocratic Corpus , the lectures and writings of Hippocrates Corpus Inscriptionum Etruscarum , an index of Etruscan texts Corpus Reformatorum , a collection of Reformation writings An abbreviation for the Corpus Juris Civilis , a collection of four books on law by Justinian I Arts Corpus band , Punk band from Sydney, Australia Corpus album Corpus album , by Sebastian Santa Maria Corpus Delicti band , also known simply as CorpusCorpus Callosum 2007 film Corpus Callosum , a 2007 film Corpus sculpture Corpus sculpture , a sculpture of Christ by Gian Lorenzo Bernini Corpus museum , a human body themed museum in the Netherlands The Corpus Clock , a large sculptural clock Corpus dance troupe Corpus , a Canadian dance troupe Other Corpus separatum , a 1947 UN Partition Plan for the Holy Land disambig bg ca Corpus cv cs Korpus de Korpus et Korpus es Corpus fr Corpus homonymie id Korpus lv Korpuss li Corpus nl Corpus ja pl Korpus ru simple Corpus sk Korpus sl Korpus sv K r tr Corpus uk ... more details
wiktionarypar textText may refer to Text literary theory , a concept in literary theory Text song Text song , a 2010 song by Mann featuring Jason Der lo TEXT , a Swedish band formed by 3 4 ex Refused Members TEXT record label TEXT , the independent record label of electronica artist Four Tet Textbook , a standardized instructional book Text display , an electronic alphanumeric display device Text file , a computer file consisting solely of printable characters from a recognized character set Text messaging , the sending of short messages by mobile phone Text segment, another name for the code segment of a binary executable computer file A particular Bible passage, sometimes a single Chapters and verses of the Bible Verses verse or verse fragment Another name for a Literature literary work The representation of written language TxT film TxT , a 2006 Filipino horror film . See also Enriched text Formatted text Plain text disambig ko is Texti it Testo disambigua pl Tekst ru simple Text sl Besedilo sv Text tr Metin anlam ayr m ... more details
otheruses Text disambiguation TEXT is the band founded by Kristofer Steen , David Sandstr m , Fredrik B ckstr m and Jon F Br nnstr m. All, except B ckstr m, were ex members of hardcore band Refused . Stylistically, they have little in common with Refused apart from this fact. Their debut album, TEXT Self Titled Text , is a mix of spoken word, music of various styles, and ambient sound effects, often producing an ethereal, avant garde sound. Apart from the three Tableau tracks which are one piece, split up across the album , each track could be described as fitting into a different genre. In 2008, a second album, Vital Signs, was released. Yet again the style of music is far from Refused and the first Text album. Only Fredrik B ckstr m and Jon F Br nnstr m appear on this album. The record came out on Demonbox Recordings in Sweden and Buddyhead in American and the rest of the world. TextText was Buddyhead 4 and considered a building block in what is now a very successfully diverse indie boutique label run by music journalist Travis Keller . Text announced a US tour the year after the record was released on Buddyhead but due to conflicts with International Noise Conspiracy tours, it was canceled. Discography TEXT Self Titled 1. Requiem for Ernst Hugo 1928 1998 1 09 Recording Info Vocals David Sandstr m, Martin Eirell and Fredrik B ckstr m Recorded on September 7, 1998 and mixed in June 1999 at Second Home by Fredrik B ckstr m, Henrik Oja, David Sandstr m and Kristoffer Steen, Recorded and mixed June 1999 at Second Home, Mastered at Tonteknik by Henrik Oja, Pelle Henricsson and Eskil L vstr m 2. Sound Is Compressed Words Rebels And Hiss 11 04 Recording Info Piano, Organ Anders ... Words Rebel And Hiss single Sound Is Compressed Words Rebel And Hiss 3 55 TEXT Vital Signs ... Inofficial TEXT Web site http web.telia.com u35503769 TEXT lyrics&sounds.html Category Refused sv TEXT ... more details
Image Bijankhan Corpus Logo.gif thumb left Bijankhan Corpus Logo The Bijankhan corpus is a tagged Textcorpuscorpus that is suitable for natural language processing research on the Persian language . This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian Part of speech tagging part of speech tags . The Bijankhan corpus was created by the http ece.ut.ac.ir dbrg Database Research Group at the University of Tehran . The corpus is non free content free in that it is not free for commercial use, although these restrictions Iran and copyright issues vary by country . The Bijankhan corpus is named after Pr. M. Bijankhan from the faculty of Literature & Human Science at the University of Tehran due to his contributions in this area. See also Hamshahri Corpus Persian Today Corpus External links http ece.ut.ac.ir dbrg Bijankhan Bijankhan corpus . Category Corpora Category Persian language ... more details
Image Hamshahri Corpus Logo.jpg thumb left Hamshahri Corpus Logo The Hamshahri Corpus is a sizable Persian language Persian corpus based on the Iranian newspaper Hamshahri , one of the first online Persian language Persian newspapers in Iran. It was in initially collected and compiled by Ehsan Darrudi at DBRG Group http ece.ut.ac.ir dbrg of the University of Tehran . This corpus was created by crawling the online news articles from the Hamshahri s website and processing the HTML pages to create a standard textcorpus for modern Information Retrieval experiments. Version 1.0 The collection contains more than 160,000 articles covering the following subject categories politics, city news, economics, reports, editorials, literature, sciences, Society, foreign news, sports, etc. The size of the documents varies from short news under 1 KB to rather long articles e.g. 140 KB with the average of 1.8 KB. The corpus is available in several formats for download http ece.ut.ac.ir dbrg Hamshahri Tagged Text 560 MB In SQL Server 2000 Tables 712 MB Version 2.0 The second release of Hamshahri Corpus released on October 20, 2008. It offers several new features and improvements More News 323,616 Text Stories in 3206 XML files a file for each day Increased Time Span 1996 06 22 to 2007 05 13 Hejri Shamsi 1375 04 02 to 1386 02 23 Bigger in Size 1.42 GB uncompressed Standard Container Unicode XML Included Images images have been extracted from the news and preserved available in an additional package makes it suitable for Images Retrieval tasks. Categorized News the news stories have been categorized semi automatically appropriate for Text Categorization and Classification tasks . The corpus is available for download in XML format http ece.ut.ac.ir DBRG Hamshahri ham2 . See also Bijankhan Corpus Persian Today CorpusTextcorpus Information Retrieval External links http ece.ut.ac.ir dbrg Hamshahri Hamshahri Corpus Homepage http ece.ut.ac.ir dbrg DBRG Group Website http hamshahree.com DBRG ... more details
Corpus cavernosum can refer to corpus cavernosum clitoridis corpus cavernosum penis corpus cavernosum urethrae was used for corpus spongiosum in older texts corpus cavernosum conchae disambig ... more details
A speech corpus or spoken corpus is a database of speech audio files and text Transcription linguistics transcriptions . In Speech technology , speech corpora are used, among other things, to create Acoustic Model acoustic models which can then be used with a speech recognition engine . In Linguistics , spoken corpora are used to do research into Phonetic , Conversation analysis , Dialectology and other fields. A corpus is one such database. Corpora is the plural of corpus i.e. it is many such databases . There are two types of Speech Corpora 1 Read Speech which includes Book excerpts Broadcast news Lists of words Sequences of numbers 2 Spontaneous Speech which includes Dialogs between two or more people includes meetings Narratives a person telling a story one such corpus is the Buckeye Corpus Map tasks one person explains a route on a map to another Appointment tasks two people try to find a common meeting time based on individual schedules. A special kind of speech corpora are non native speech databases that contain speech with foreign accent. See also Transcription linguistics EXMARaLDA Praat Transcriber References Edwards, Jane Lampert, Martin eds. 1992 Talking Data Transcription and Coding in Discourse Research. Hillsdale Erlbaum. Leech, Geoffrey Myers, Greg Thomas, Jenny eds. 1995 Spoken English on Computer Transcription, Markup and Application. Harlow Longman. External links http www.linguistics.ucsb.edu research sbcorpus.html Santa Barbara Corpus of Spoken American English http buckeyecorpus.osu.edu Buckeye Corpus The Buckeye Corpus of Conversational Speech http www.ece.msstate.edu research isip projects switchboard Switchboard ISIP s Switchboard database http www.exmaralda.org corpora en sfbkorpora.html Spoken Language Corpora at the Research Center on Multilingualism http std.metu.edu.tr en The Spoken Turkish Corpus at METU Ankara http www.voxforge.org VoxForge ... Corpus linguistics Category Speech recognition de Textkorpus fr Corpus oral sl Govorni korpus ... more details
Corpus linguistics is the study of language as expressed in samples Textcorpus corpora or real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural ... corpus linguists who work with unannotated plain text inevitably apply some method to isolate terms ... Textcorpus Translation memory Treebank Xaira a general purpose XML aware open source corpus analysis ... formerly Tenka Text an open source GPL ed corpus analysis tool written in C http www.ucl.ac.uk english ... text mining Discussion group text mining DEFAULTSORT Corpus Linguistics Category Applied linguistics ... couldn t find one. The corpus approach runs counter to Noam Chomsky s view that real language is riddled ... measure of the ethnographic representativity of their data. Citation needed date August 2010 Corpus ... contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John McHardy Sinclair John Sinclair ref Sinclair, J. The automatic analysis of corpora , in Svartvik, J. ed. Directions in Corpus Linguistics Proceedings ... as a path to greater linguistic understanding and rigour. Linguistics History A landmark in modern corpus ... of Present Day American English in 1967, a work based on the analysis of the Brown Corpus , a carefully ... Language American Heritage Dictionary , the first dictionary to be compiled using corpus ... . The Survey of English Usage Corpus was used in the development of one of the most important Corpus based Grammars, the Comprehensive Grammar of English Quirk et al. 1985 . ref Quirk, R., Greenbaum .... 1985. ref The Brown Corpus has also spawned a number of similarly structured corpora the LOB Corpus 1960s British English , Kolhapur Indian English , Wellington New Zealand English , Australian Corpus of English Australian English , the Frown Corpus early 1990s American English , and the FLOB Corpus ... the International Corpus of English , and the British National Corpus , a 100 million word collection ... more details
The Buckeye Corpus of conversation al speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt. ref Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. 2005 . The Buckeye Corpus of Conversational Speech Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90 95. ref ref Raymond, William D., Robin Dautricourt, and Elizabeth Hume. 2006 . Word medial t,d deletion in spontaneous speech Modeling the effects of extra linguistic, lexical, and phonological factors. Language Variation and Change, 18 1 , 55 97. ref ref Eric Fosler Lussier, Laura Dilley, Na im Tyson, Mark Pitt 2007 The Buckeye Corpus of Speech Updates and Enhancements. In Proceedings of Interspeech 2007, Antwerp, Belgium. ref ref Dilley, L., & Pitt, M. 2007 . A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America, 122 4 , 2340 2353. ref . It contains high quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer s voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistic s interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time aligned phonetic labels, are stored in a format for use with speech analysis software Xwaves and Wavesurfer . Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academia academics and industry . The project was funded by the National Institute on Deafness and Other Communication ... Corpus of Conversational Speech 2nd release http buckeyecorpus.osu.edu www.buckeyecorpus.osu.edu ... Buckeye Speech Corpus Homepage Category Corpora ... more details
wiktionary Corpus delicti can refer to Corpus delicti , a legal term Corpus Delicti band , a gothic rock band Corpus Delicti album , an album by Die Form Disambig ... more details
Corpus Christi body of Christ in Latin may refer to Places and related matters Corpus Christi, Texas Corpus Christi Bay Corpus Christi International Airport Corpus Christi Independent School District Naval Air Station Corpus Christi Two vessels of the United States Navy bearing this name, both named for the city USS Corpus Christi PF 44 , a Tacoma class frigate that served in World War II USS City of Corpus Christi SSN 705 , a Los Angeles class submarine currently ref http www.nvr.navy.mil nvrships details SSN705.htm ref as of January 2011 in commission. Corpus Christi, Paraguay Corpus Christi, Tamaulipas Educational institutions University Colleges Corpus Christi College, Cambridge Corpus Christi College, Oxford Texas A&M University Corpus Christi Other Educational institutions Corpus Christi College , Belfast , Northern Ireland, County Antrim United Kingdom Corpus Christi Catholic College, Leeds , West Yorkshire, United Kingdom Corpus Christi Catholic High School, Wollongong Corpus Christi Catholic High School in Wollongong, New South Wales, Australia Corpus Christi Catholic Secondary School , in Burlington, Ontario Corpus Christi College, Melbourne , Victoria, Australia Corpus Christi College, Perth , Western Australia Corpus Christi College Vancouver , British Columbia Corpus Christi Elementary School , a Catholic school in Chambersburg, PA Corpus Christi School Hobart , Tasmania Pallikoodam , a school formerly known as Corpus Christi High School in Kottayam, Kerala, India Entertainment Corpus Christi band , a Christian Metal band from Cincinnati, Ohio Corpus Christi play , a play by Terrence McNally Corpus Christi Carol , a Middle English hymn or carol Corpus Christi Records , a record label Religion Corpus Christi feast , a Christian feast day, or solemnity. disambig cs Corpus Christi de Corpus Christi es Corpus Christi desambiguaci n fr Corpus Christi nl Corpus Christi ja sk Corpus Christi Reflist ... more details
About the legal term Italic title Prerogative writs lang la Habeas corpus Latin meaning you are to have the body ref See wikt corpus Wiktionary Corpus ref is a writ , or legal action, through which a prisoner ... or by another person coming to his aid. Habeas corpus originated in the England English legal system ... safeguarding individual freedom against arbitrary state action. A writ of habeas corpus is a summons ... of habeas corpus . One reason for the writ to be sought by a person other than the prisoner is that the detainee ... habeas corpus . ref Google books scan of book Introduction to the Study of the Law of the Constitution ... OF THE CONSTITUTION A. V. Dicey&ei Y2YfR rMApGepgKuqZWzBg&id kz40AAAAIAAJ&output text ref For example ... de libertad . Habeas corpus has certain limitations. It is technically only a procedural remedy it is a guarantee ... trial is permitted by the law then habeas corpus may not be a useful remedy. Furthermore, in many countries ... of habeas corpus has nonetheless long been celebrated as the most efficient safeguard of the liberty of the subject. The jurist Albert Venn Dicey wrote that the British Habeas Corpus Acts declare no principle ... on British politics Routledge, 1994 ref The writ of habeas corpus is one of what are called the extraordinary ... Writ of Habeas corpus IPAc en icon h e b i s k r p s IPA is standard on Wikipedia. Please ... arrest warrant s in England. The writ is referred to in full in legal texts as habeas corpus ... tibi quod corpus A.B. in prisona nostra sub custodia tua detentum, ut dicitur, una cum die et ... We command that you have ... . That the basic form of the writs of habeas corpus, now written ... of the writ is often used to distinguish it from similar ancient writs, also called habeas corpus . These include Habeas corpus ad deliberandum et recipiendum a writ for bringing an accused from a different ... Habeas corpus ad faciendum et recipiendum also called habeas corpus cum causa a writ of a superior .... Habeas corpus ad prosequendum a writ ordering return with a prisoner for the purpose of prosecuting ... more details
Lead too short date June 2010 The Brown University Standard Corpus of Present Day American English or just Brown Corpus was compiled in the 1960s by Henry Kucera and W. Nelson Francis at Brown University , Providence, Rhode Island Providence , Rhode Island as a general Textcorpuscorpustext collection in the field of corpus linguistics . History In 1961 1963, Kucera and Francis published their classic ... on what is known today simply as the Brown Corpus . The Brown Corpus was a carefully compiled selection ..., which first appeared in 1969, was the first dictionary to be compiled using corpus linguistics for word frequency and other information. The initial Brown Corpus had only the words themselves, plus a location ... error rate meant that extensive manual proofreading was required. The tagged Brown Corpus ... Oslo Bergen Corpus . The tagged corpus enabled far more sophisticated statistical analysis ... 7 of the Brown Corpus, to and of more than another 3 each while about half the total vocabulary of about 50,000 words are hapax legomena words that occur only once in the corpus. ref Kirsten Malmkj r ... , and is known as Zipf s law . Although the Brown Corpus pioneered the field of corpus linguistics, by now typical corpora such as the Corpus of Contemporary American English , the British National Corpus or the International Corpus of English tend to be much larger, on the order of 100 million words. Sample distribution The Corpus consists of 500 samples, distributed across 15 genres in rough ... items such as formulae also had special codes. The corpus originally 1961 contained 1,014,312 words sampled from 15 text categories A. PRESS Reportage 44 texts Political Sports Society Spot News Financial ... wh qualifier how WRB wh adverb how, where, when Note that some versions of the tagged Brown corpus contain ... means foreign word. See also LOB Corpus , a corpus of British English based on the same parameters as the Brown Corpus References Reflist External links http khnt.aksis.uib.no icame manuals brown Brown ... more details
horn may be an allegory of disease. The Hippocratic Corpus Latin Corpus Hippocraticum , Hippocratic ... in the Corpus, none is proven to be of Hippocrates hand itself, though some sources say otherwise ... been written by one person. But the corpus carries Hippocrates s name as it was attributed to him ..., only the Kos Koan school of ancient Greek medicine that contributed to the Corpus the Knidos Knidian ... 1961 pp 86 87 ref Content The Hippocratic Corpus contains textbooks, lectures, research, notes and philosophical ... be found between works in the Corpus. ref name sing28 Harvnb Singer Underwood 1962 p 28 ref One significant portion of the corpus is made up of case histories, of which there are forty two. Of these, 60 ... described in the Corpus are endemic disease s colds, consumption, pneumonia, etc. ref name ... s Aphorisms Section 1 Aphorisms i.1. The writing style of the Corpus has been remarked upon for centuries ... Adams , a translator of the Corpus, goes further and calls it sometimes obscure . Of course, not all of the Corpus is of this laconic style, though most of it is. It was Hippocratic practice to write in this style. ref name adams18 Harvnb Adam 1891 p 18 ref The whole corpus is written in Ionic ... Printed editions The entire Hippocratic Corpus was first printed as a unit in 1525. This edition was in Latin ... Littr who spent twenty two years 1839 1861 working diligently on the Hippocratic Corpus. This was scholarly ... began to appear with Greek text, French translation, and commentary in the Collection Bud . Other ... in the Corpus medicorum graecorum published by the Akademie Verlag in Berlin. The Oath main Hippocratic Oath The most famous work in the Hippocratic corpus is the Hippocratic Oath , a landmark declaration ..., like many other works from the time period, it is included in the Corpus and named after Hippocrates ... of the Corpus Col begin Col 1 of 3 The Prognostics On Airs, Waters, and Places On Regimen in Acute Diseases ... Browse browse Hippocrates.html Works of the Hippocratic Corpus online, translated by Francis ... more details
The Calgary Corpus is a collection of text and binary data files, commonly used for comparing data compression ... in 1987 and was commonly used in the 1990s. In 1997 it was replaced by the Canterbury Corpus , but the Calgary Corpus still exists for comparison and is still useful for its original intended purpose. Contents In its most commonly used form, the corpus consists of 14 files totaling 3,141,622 bytes as follows. class wikitable Size bytes File name Description 111,261 BIB ASCII text in UNIX refer format 725 bibliographic references. 768,771 BOOK1 unformatted ASCII text Thomas Hardy Far from the Madding Crowd. 610,856 BOOK2 ASCII text in UNIX troff format Witten Principles of Computer Speech. 102,400 GEO 32 bit numbers in IBM floating point format seismic data. 377,109 NEWS ASCII text USENET ... in security. 513,216 PIC 1728 x 2376 bitmap image MSB first text in French and line diagrams ... which include 4 additional text files in UNIX troff format, PAPER3 through PAPER6. Benchmarks The Calgary corpus was a commonly used benchmark for data compression in the 1990 s. Results were most commonly ... uclc.info calgary corpus compression test.htm UCLC benchmark by Johan de Bock uses this method. For some data compressors it is possible to compress the corpus smaller by combining the inputs into an uncompressed ... between the text files. In other cases, the compression is worse because the compressor handles ... the compressed sizes of the 14 file Calgary corpus using both methods for some popular compression programs ... corpus Compression and SHA 1 crack Challenge http mailcom.com challenge is a contest started by Leonid A. Broukhis on May 21, 1996 to compress the 14 file version of the Calgary corpus. The contest ... to output files different from the Calgary corpus as long as they hash to the same values as the original ... links http links.uwaterloo.ca calgary.corpus.html Original home of the Calgary Corpus http corpus.canterbury.ac.nz ... index.htm Information on the Calgary Corpus http mailcom.com challenge The Calgary corpus ... more details
El Corpus is a municipality in the Honduras Honduran Departments of Honduras department of Choluteca department Choluteca . Honduras geo stub Choluteca Department coord 13 17 0 N 87 2 0 W type adm2nd region HN source nlwiki display title Category Municipalities of the Choluteca Department Category Populated places in Honduras es El Corpus it El Corpus nl El Corpus pt El Corpus ... more details
Image Corpus albicans.JPG thumb 250px human corpus albicans. Image Folliclesinovary.jpg thumb 300px The corpus albicans Latin for white body is the regressed form of the corpus luteum . As the corpus luteum is being broken down by macrophage s, fibroblast s lay down type I collagen , forming the corpus albicans. This process is called luteolysis . The remains of the corpus albicans may persist as a scar on the surface of the ovary . The corpus albicans is also known as atretic corpus luteum , corpus candicans , or simply as albicans . References Stedman s 1505656 cite book author Hiatt, James L. Gartner, Leslie P. title Color textbook of histology publisher W.B. Saunders location Philadelphia year 2001 pages isbn 0 7216 8806 3 oclc doi External links OklahomaHistology 97 03 BUHistology 18104loa Female Reproductive System ovary, corpus albicans KansasHistology female female09 Female reproductive system DEFAULTSORT Corpus Albicans Category Histology Category Gynecology Category Reproductive system Category Pelvis genitourinary stub it Corpo albicante ja pt Corpo albicans ru ... more details
The Canterbury Corpus is a collection of Computer file files intended for use as a benchmark for testing lossless data compression algorithms. It was created in 1997 at the University of Canterbury , New Zealand and designed to replace the Calgary Corpus . See also Data compression External links http corpus.canterbury.ac.nz The Canterbury Corpus Standard test item Compression Methods Category Data compression comp sci stub de Canterbury Corpus ... more details
The Corpus Christianorum CC is a major publishing undertaking of the Belgium Belgian publisher Brepols devoted to patristic and medieval Latin texts. The principal series are the Series Graeca CCSG , Series Latina CCSL , and the Continuatio Mediaevalis CCCM . There is also a smaller section, the Series Apocryphorum CCSA , devoted to Apocryphal works, and a collection of autographs, the Autographa Medii Aevi CCAMA . The principal series are seen in some ways as successors to Jacques Paul Migne Migne s Patrologia e. publishing stub External links http www.corpuschristianorum.org Official site Category Medieval Latin literature Category Patristics Category Series of books cs Corpus Christianorum de Corpus Christianorum es Corpus Christianorum fr Corpus Christianorum it Corpus Christianorum pl Corpus Christianorum ... more details
unreferenced date December 2010 TMC Tehran Monolingual Corpus is a large scale Persian monolingual corpus. TMC is suited for Language model Language Modeling and relevant research areas in Natural Language Processing . The corpus is extracted from Hamshahri Corpus and Iranian Students News Agency ISNA news agency website. The quality of Hamshahri corpus is improved for language modeling purpose by a series of tokenization and spell checking steps. TMC comprises more than 250 million words. The total number of unique words with frequency of two or more of the corpus is about 300 thousand, which is relatively good for a highly inflectional language like Persian. TMC is created by Natural Language Processing Lab. of University of Tehran . The corpus is freely available for research use. See also TEP Tehran English Persian parallel corpus Hamshahri Corpus External links http ece.ut.ac.ir nlp Homepage of NLP Lab., University of Tehran Category Corpora Category Persian language ... more details
Image Koran manuscript.jpg thumb right 320px Qur anic manuscript page Corpus Coranicum is a research project of the Berlin Brandenburg Academy of Sciences and Humanities to develop a better contextual understanding in the West the primary audience for the Corpus Coranicum of the Islam ic scripture known as the Qur an . Begun in 2007, the initial three year database project is led by Middle Eastern studies Semitic and Arabic studies Prof. Angelika Neuwirth at the Free University of Berlin . The project is currently funded till 2025, but could well take longer to complete. ref name wsj Andrew Higgins and Almut Schoenfeld, http online.wsj.com article SB120008793352784631.html The Lost Archive Missing for a half century, a cache of photos spurs sensitive research on Islam s holy text , Wall Street Journal , 12 January 2008. Retrieved 2010 02 07. ref Goals and methodology The project will document the Qur an in its handwritten form and oral tradition, and include an extensive commentary interpreting the text in the context of its History of the Qur an historical development . ref name bbawcc http www.bbaw.de bbaw Forschung Forschungsprojekte Coran de Startseite Corpus Coranicum , retrieved 2010 02 07. ref Much of the Corpus Coranicum source material consists of photographs of ancient Qur an manuscripts collected before World War II by Gotthelf Bergstr sser and Otto Pretzl . After the Royal ... 10080 Koranplone welcome to the corpus coranicum Corpus Coranicum prospectus . Retrieved ... the heated debate surrounding the text often stands in contrast to an actual knowledge of its contents ... of the shorter sura s , teenagers would explore the text through the tools of modern philology while ... that the Corpus Coranicum would spark similar outrage among Muslims, comparing it to the punishment ... pointed out that the Corpus Coranicum project was in any case not directed to Islamic fundamentalism ... Category Discipline oriented digital libraries de Corpus Coranicum ... more details
Infobox Anatomy Name Corpus hemorrhagicum Latin GraySubject GrayPage Image Caption Image2 Caption2 Precursor System Artery Vein Nerve Lymph MeshName MeshNumber DorlandsPre c 56 DorlandsSuf 12260536 The corpus hemorrhagicum bloody body is a temporary structure formed immediately after ovulation from the ovarian follicle . After the trauma heals, the subsequent structure is called the corpus luteum which in turn becomes the corpus albicans before degenerating. External links eMedicineDictionary corpus hemorrhagicum http www.cvm.okstate.edu instruction mm curr histology fr HiFRp10.htm Image at okstate.edu http education.vetmed.vt.edu Curriculum VM9124 Diagnostics UltrasoundPages CorpusLuteum.htm Image at vt.edu Female reproductive system DEFAULTSORT Corpus Hemorrhagicum Category Female reproductive system genitourinary stub it Corpo emorragico ja ... more details
The Enron Corpus is a large database of over 600,000 emails generated by 158 employees ref Klimt, Bryant and Yiming Yang. http citeseerx.ist.psu.edu viewdoc download?doi 10.1.1.61.1645&rep rep1&type pdf The Enron Corpus A New Dataset for Email Classification Research ref of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company s collapse. ref http sgi.nu enron The Enron Email Corpus Retrieved March 5, 2011. ref A copy of the database was subsequently purchased for 10,000 by Andrew McCallum , a computer scientist at the University of Massachusetts . ref name nyt Markoff, John. http www.nytimes.com 2011 03 05 science 05legal.html?hp Armies of Expensive Lawyers, Replaced by Cheaper Software . New York Times March 5, 2011. p A1. ref He released this copy to researchers, providing a trove of data that has been used for studies on social networking and computer analysis of language. The corpus is unique in that it is one of the only publicly available mass collections of real emails easily available for study, as such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access. ref name nyt References reflist External Links http sgi.nu enron Enron Corpus website Category Enron Category Corpus linguistics ... more details